arthurpassos commented on code in PR #35825: URL: https://github.com/apache/arrow/pull/35825#discussion_r1210580352
########## cpp/src/arrow/array/builder_dict.h: ########## @@ -724,6 +747,7 @@ using BinaryDictionaryBuilder = DictionaryBuilder<BinaryType>; using StringDictionaryBuilder = DictionaryBuilder<StringType>; using BinaryDictionary32Builder = Dictionary32Builder<BinaryType>; using StringDictionary32Builder = Dictionary32Builder<StringType>; +using BinaryDictionary64Builder = Dictionary64Builder<LargeBinaryType>; Review Comment: Hi @pitrou. First of all, thanks for looking into this. I am trying to fix the issue described in https://github.com/apache/arrow/issues/32723. It's an issue that pops up when data of complex data structures end up being chunked. The goal of this PR is to introduce a setting that'll allow the use of LARGE* variants of string / binary types to avoid chunking as suggested by @emkornfield. I simply followed my intuiton that if the non-large types rely on 32bits, the large type would rely on 64bits. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
