tnederlof commented on issue #15056: URL: https://github.com/apache/arrow/issues/15056#issuecomment-1381011491
Thats kind of surprising to me it takes the long to open the dataset, I suspect all the partitioning is causing issues. I was able to replicate the issue you faced using the same partitioning structure (I just faked 24x times the data with different intervals). Then I tried saving all of the data in a single parquet file (its about 1gb) and now it runs in <0.5sec instead of 8-9seconds. Could you please try saving the data as non hive partitioned parquet file(s)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
