RussellSpitzer commented on issue #2302: URL: https://github.com/apache/iceberg/issues/2302#issuecomment-802829643
I have some thoughts but we still really need more info, if you have a chance could you capture the Spark UI for the job? In particular the stage in which the table is read? My main guess is the actually time spent is related to something like such as scheduling delay or some other Spark level thing. It really doesn't make any sense that it would be reading the files if changing the compression has no effect. I've seen the difference between snappy and gzip be 2x. Also the size of your data is so small it should be practically instantaneous. So I would request to see the stage/job UI and possibly the table creation statement. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
