jmgpeeters opened a new pull request #9629: URL: https://github.com/apache/arrow/pull/9629
The only code change required, AFAICT, is the calculation of num_dicts, which is no longer simply the number of fields, but rather the unique number of id's they point to. I'm calculating this on-demand, as it's quite cheap and not frequently called, but could also (p)re-compute this on every addField. For now, I've added tests that read materialised data generated from Java, as we don't support writing IPC with shared dictionaries in C++ either yet (and out of scope here). Down the line, I would like full read & write support for shared dictionaries across at least C++, Python, Java and Julia, so I'll be coming back to this with follow-up PR's where needed. As part of that, I'll also change the tests to no longer rely on materialised files, but use the round-trip mechanism. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
