jorisvandenbossche edited a comment on pull request #9702:
URL: https://github.com/apache/arrow/pull/9702#issuecomment-986788908


   > Small note: I see that in one of your recent commits, you changes all 
lower case (eg `enable_dictionary`) to camel case (`GetEnableDictionary`). I am 
not fully sure, but I think we actually do use lower case for setter/getter 
methods elsewhere 
(https://google.github.io/styleguide/cppguide.html#Function_Names also mentions 
this is allowed)
   
   Related to this, @pitrou was mentioning that using a simpler struct might be 
a better way to go, so we don't need those getter/setter methods at al. One 
example to look at are the S3Options 
(https://github.com/apache/arrow/blob/a71c1e6b19d0a0a735c7bc014211b63c23df56a6/cpp/src/arrow/filesystem/s3fs.h#L96-L203),
 which have a mixture of plain struct fields for the main options, with some 
additional methods (and later on this struct is used to pass configuration to 
AWS SDK lib to create a client object)
   
   
   > As for the `stripe_size` issue it is more complex than what's in your 
example which I do manage to replicate.
   
   I think the main "mystery" is in how the exact value of `stripe_size` is 
interpreted (I assume it is bytes, but I would think that 1024 floats, even 
compressed, would be more than 1 byte). 
   Because if I increase the size of both the table and batch size, it actually 
works as expected:
   
   ```python
   >>> table = pa.table({'a': np.random.randn(1024 * 10 * 40 + 1)})
   >>> orc.write_table(table, "test_orc_size.orc", batch_size=1024 * 10, 
stripe_size=1)
   >>> orc.ORCFile("test_orc_size.orc").nstripes
   41
   ```
   
   And also with the smaller batch_size of 1024 it works if I edit the cython 
wrapper to allow a stripe size of 0 (which basically then means: create a 
stripe for each batch):
   
   ```python
   >>> table = pa.table({'a': np.random.randn(1024 * 40 + 1)})
   >>> orc.write_table(table, "test_orc_size.orc", batch_size=1024, 
stripe_size=0)
   >>> orc.ORCFile("test_orc_size.orc").nstripes
   41
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to