jonkeane commented on pull request #11225: URL: https://github.com/apache/arrow/pull/11225#issuecomment-929355830
Aaaah I see what's going on here. Ultimately, it was overly generous altrep validation code in {arrowbench} (what you suggested up above which I'm going to add to {arrowbench} now wouldn't have had this problem. What was happening was: the factors/dicts were being converted, and inside of them one of the attributes is (starting with this branch) backed by an altrep representation, so when my validation only looked for `arrow::array` in the output it found that attribute and declared that it had successfully used altrep (when it had not, at least for the bulk of the data!). Here's the output from `.Internal(inspect(factor_array))` showing that (it looks like `levels`) _is_ backed by an altrep string now. ``` @7fe6607e0000 13 INTSXP g0c7 [OBJ,REF(6),ATT] (len=1000000, tl=0) 19,21,5,15,21,... ATTRIB: @7fe6590bcc80 02 LISTSXP g0c0 [REF(1)] TAG: @7fe67f8099e0 01 SYMSXP g1c0 [MARK,REF(65535),LCK,gp=0x4000] "levels" (has value) @7fe6590bccb8 16 STRSXP g0c0 [REF(65535)] arrow::Array<string, 0 nulls> len=51, Array=<0x7fe69f41e318> @7fe6590bcd28 22 EXTPTRSXP g0c0 [REF(6)] TAG: @7fe67f809dd0 01 SYMSXP g1c0 [MARK,REF(33947),LCK,gp=0x4000] "class" (has value) @7fe689733048 16 STRSXP g1c1 [MARK,REF(65535)] (len=1, tl=0) @7fe69f81aed8 09 CHARSXP g1c1 [MARK,REF(367),gp=0x61] [ASCII] [cached] "factor" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org