zherenyu831 edited a comment on issue #1798:
URL: https://github.com/apache/hudi/issues/1798#issuecomment-657193426


   @umehrot2 @vinothchandar 
   Thank you guys. and sorry for lately reply.
   
   Here is my snapshot of spark ui.
   
   First query I used, files processed by resolveRelation was 950, cost 31 
seconds
   ```
   
spark.read.format("org.apache.hudi").load("s3://daas-hudi-test/paylite_payment_read/orders_v6/data/*/*/*").count()
   ```
   
   and second I used below query, and files processed by resolveRelation was 
4750, cost 2.5 mins
   ``` 
   
spark.read.format("org.apache.hudi").load("s3://daas-hudi-test/paylite_payment_read/orders_v6/data/*/*/*/*").count()
   ```
   
   since we are using spark stream to write data into the table, so the file 
size will be changed a little when second query run.
   
   <img width="1665" alt="スクリーンショット 2020-07-12 17 51 22" 
src="https://user-images.githubusercontent.com/52404525/87242515-61857300-c468-11ea-9e23-a874afed66b8.png";>
   <img width="1666" alt="スクリーンショット 2020-07-12 17 51 28" 
src="https://user-images.githubusercontent.com/52404525/87242522-6a764480-c468-11ea-89f1-f865875783fe.png";>
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to