RussellSpitzer commented on issue #2302:
URL: https://github.com/apache/iceberg/issues/2302#issuecomment-802829643


   I have some thoughts but we still really need more info, if you have a 
chance could you capture the Spark UI for the job? In particular the stage in 
which the table is read? My main guess is the actually time spent is related to 
something like such as scheduling delay or some other Spark level thing. It 
really doesn't make any sense that it would be reading the files if changing 
the compression has no effect. I've seen the difference between snappy and gzip 
be 2x. Also the size of your data is so small it should be practically 
instantaneous.
   
   
   So I would request to see the stage/job UI and possibly the table creation 
statement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to