umehrot2 opened a new pull request #1944:
URL: https://github.com/apache/hudi/pull/1944


   ## What is the purpose of the pull request
   
   The purpose of this pull request is to implement changes required on Hudi 
side to get Bootstrapped tables integrated with Presto. The testing was done 
against **presto 0.232** and following changes were identified to make it work:
   
   - Annotation **UseRecordReaderFromInputFormat** is required on 
**HoodieParquetInputFormat** as well, because the reading for bootstrapped 
tables needs to happen through record reader to be able to perform the merge. 
On presto side, this annotation is already handled.
   
   - We need to internally maintain `VIRTUAL_COLUMN_NAMES` because presto's 
internal hive version **hive-apache-1.2.2** has `VirutalColumn` as a class, 
versus the one we depend on in hudi which is an **enum**. This results in 
following error in presto:
   ```
   2020-08-10T21:59:58.957Z     ERROR   remote-task-callback-2  
com.facebook.presto.execution.StageExecutionStateMachine        Stage execution 
20200810_215953_00006_34kqg.1.0 failed
   java.lang.NoSuchFieldError: VIRTUAL_COLUMN_NAMES
        at 
org.apache.hudi.hadoop.HoodieParquetInputFormat.lambda$getRecordReader$2(HoodieParquetInputFormat.java:201)
        at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
        at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
        at 
org.apache.hudi.hadoop.HoodieParquetInputFormat.getRecordReader(HoodieParquetInputFormat.java:203)
        at 
com.facebook.presto.hive.HiveUtil.createRecordReader(HiveUtil.java:253)
        at 
com.facebook.presto.hive.GenericHiveRecordCursorProvider.lambda$createRecordCursor$0(GenericHiveRecordCursorProvider.java:74)
        at 
com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:360)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1824)
        at 
com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)
        at 
com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)
        at 
com.facebook.presto.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:82)
        at 
com.facebook.presto.hive.GenericHiveRecordCursorProvider.createRecordCursor(GenericHiveRecordCursorProvider.java:73)
        at 
com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:374)
        at 
com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:137)
        at 
com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:113)
        at 
com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:52)
   ```
   - Dependency changes in `hudi-presto-bundle` to avoid runtime exceptions.
   
   ## Brief change log
   
   ## Verify this pull request
   
   The changes have been tested on **emr-5.30.1** against **presto 0.232**.
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to