yh2388 commented on issue #2609:
URL: https://github.com/apache/hudi/issues/2609#issuecomment-1139392503

   > @lw309637554 thanks for your response.
   > 
   > **_1. about first attempt parquet is 23 secs, but hudi is 40 secs. i see 
metadata init cost some time in the log._** yes, 2 major spending on meta data 
loading. is that expected or anything optimized ?.
   > 
   > 20 sec in this section `2021-02-27T04:27:24.714Z INFO hive-hive-18 
org.apache.hudi.hadoop.utils.HoodieInputFormatUtils Total paths to process 
after hoodie filter 691` `2021-02-27T04:27:45.364Z INFO hive-hive-17 
org.apache.hudi.hadoop.utils.HoodieInputFormatUtils Reading hoodie metadata 
from path s3a://my-test-bucket/tmp/ramesh/hudi_0_7_cl2/sample_data'=`
   > 
   > another 15 sec goes here `2021-02-27T04:27:46.360Z INFO hive-hive-17 
org.apache.hudi.hadoop.utils.HoodieInputFormatUtils Total paths to process 
after hoodie filter 623` `2021-02-27T04:28:02.931Z DEBUG query-execution-16 
io.prestosql.execution.StageStateMachine Stage 20210227_042722_00016_9dket.2 is 
SCHEDULED`
   > 
   > **2 about second attempt parquet is very fast,maybe presto support the 
parquet format local cache.** seems like local caching. will look in to that 
direction how presto local cache works.
   > 
   > **3.also parquet and hudi table result is not equal?** both are same 
dataset. sorry, the result's order not maintained. 151 rows less in hudi 
dataset because duplicate rows eliminated during ingestion.
   
   We have the same problem. Has this problem been solved? I hope to get your 
help
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to