umehrot2 opened a new pull request #1944:
URL: https://github.com/apache/hudi/pull/1944
## What is the purpose of the pull request
The purpose of this pull request is to implement changes required on Hudi
side to get Bootstrapped tables integrated with Presto. The testing was done
against **presto 0.232** and following changes were identified to make it work:
- Annotation **UseRecordReaderFromInputFormat** is required on
**HoodieParquetInputFormat** as well, because the reading for bootstrapped
tables needs to happen through record reader to be able to perform the merge.
On presto side, this annotation is already handled.
- We need to internally maintain `VIRTUAL_COLUMN_NAMES` because presto's
internal hive version **hive-apache-1.2.2** has `VirutalColumn` as a class,
versus the one we depend on in hudi which is an **enum**. This results in
following error in presto:
```
2020-08-10T21:59:58.957Z ERROR remote-task-callback-2
com.facebook.presto.execution.StageExecutionStateMachine Stage execution
20200810_215953_00006_34kqg.1.0 failed
java.lang.NoSuchFieldError: VIRTUAL_COLUMN_NAMES
at
org.apache.hudi.hadoop.HoodieParquetInputFormat.lambda$getRecordReader$2(HoodieParquetInputFormat.java:201)
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
at
org.apache.hudi.hadoop.HoodieParquetInputFormat.getRecordReader(HoodieParquetInputFormat.java:203)
at
com.facebook.presto.hive.HiveUtil.createRecordReader(HiveUtil.java:253)
at
com.facebook.presto.hive.GenericHiveRecordCursorProvider.lambda$createRecordCursor$0(GenericHiveRecordCursorProvider.java:74)
at
com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1824)
at
com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)
at
com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)
at
com.facebook.presto.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:82)
at
com.facebook.presto.hive.GenericHiveRecordCursorProvider.createRecordCursor(GenericHiveRecordCursorProvider.java:73)
at
com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:374)
at
com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:137)
at
com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:113)
at
com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:52)
```
- Dependency changes in `hudi-presto-bundle` to avoid runtime exceptions.
## Brief change log
## Verify this pull request
The changes have been tested on **emr-5.30.1** against **presto 0.232**.
## Committer checklist
- [ ] Has a corresponding JIRA in PR title & commit
- [ ] Commit message is descriptive of the change
- [ ] CI is green
- [ ] Necessary doc changes done or have another open PR
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]