[GitHub] [arrow] westonpace commented on issue #10803: Reading strings efficiently in C++

GitBox Wed, 04 Aug 2021 19:06:55 -0700


westonpace commented on issue #10803:
URL: https://github.com/apache/arrow/issues/10803#issuecomment-893106284



   Sorry, it seems I missed this.
   
   > If the schema isn't known upfront, is it possible to set the 
read_dictionary flag after opening the file?
   
   I think you can get away with this.  If you can't then it shouldn't be too 
much overhead to open the file twice.  First, open it and read the metadata to 
determine the columns.  Second, open it and read everything.
   
   > How can one differentiate when an Array is a DynamicArray and when it's 
not?
   
   I'll assume you mean `DictionaryArray`?  You can determine this from the 
Array's type (`arr->type()->id() == 1arrow::Type::DICTIONARY` I think)
   
   > Are the dictionaries the same for all chunks of a column, or can different 
chunks have different dictionaries?
   
   I'm fairly certain both Arrow and Parquet support having different 
dictionaries in different chunks.  It's something of a nuisance and there are 
some internal utilities in Arrow for unifying dictionaries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on issue #10803: Reading strings efficiently in C++

Reply via email to