jorisvandenbossche edited a comment on pull request #9702: URL: https://github.com/apache/arrow/pull/9702#issuecomment-986788908
> Small note: I see that in one of your recent commits, you changes all lower case (eg `enable_dictionary`) to camel case (`GetEnableDictionary`). I am not fully sure, but I think we actually do use lower case for setter/getter methods elsewhere (https://google.github.io/styleguide/cppguide.html#Function_Names also mentions this is allowed) Related to this, @pitrou was mentioning that using a simpler struct might be a better way to go, so we don't need those getter/setter methods at al. One example to look at are the S3Options (https://github.com/apache/arrow/blob/a71c1e6b19d0a0a735c7bc014211b63c23df56a6/cpp/src/arrow/filesystem/s3fs.h#L96-L203), which have a mixture of plain struct fields for the main options, with some additional methods (and later on this struct is used to pass configuration to AWS SDK lib to create a client object) > As for the `stripe_size` issue it is more complex than what's in your example which I do manage to replicate. I think the main "mystery" is in how the exact value of `stripe_size` is interpreted (I assume it is bytes, but I would think that 1024 floats, even compressed, would be more than 1 byte). Because if I increase the size of both the table and batch size, it actually works as expected: ```python >>> table = pa.table({'a': np.random.randn(1024 * 10 * 40 + 1)}) >>> orc.write_table(table, "test_orc_size.orc", batch_size=1024 * 10, stripe_size=1) >>> orc.ORCFile("test_orc_size.orc").nstripes 41 ``` And also with the smaller batch_size of 1024 it works if I edit the cython wrapper to allow a stripe size of 0 (which basically then means: create a stripe for each batch): ```python >>> table = pa.table({'a': np.random.randn(1024 * 40 + 1)}) >>> orc.write_table(table, "test_orc_size.orc", batch_size=1024, stripe_size=0) >>> orc.ORCFile("test_orc_size.orc").nstripes 41 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
