westonpace commented on issue #10803: URL: https://github.com/apache/arrow/issues/10803#issuecomment-893106284
Sorry, it seems I missed this. > If the schema isn't known upfront, is it possible to set the read_dictionary flag after opening the file? I think you can get away with this. If you can't then it shouldn't be too much overhead to open the file twice. First, open it and read the metadata to determine the columns. Second, open it and read everything. > How can one differentiate when an Array is a DynamicArray and when it's not? I'll assume you mean `DictionaryArray`? You can determine this from the Array's type (`arr->type()->id() == 1arrow::Type::DICTIONARY` I think) > Are the dictionaries the same for all chunks of a column, or can different chunks have different dictionaries? I'm fairly certain both Arrow and Parquet support having different dictionaries in different chunks. It's something of a nuisance and there are some internal utilities in Arrow for unifying dictionaries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
