jonkeane commented on pull request #11225:
URL: https://github.com/apache/arrow/pull/11225#issuecomment-929355830


   Aaaah I see what's going on here. 
   
   Ultimately, it was overly generous altrep validation code in {arrowbench} 
(what you suggested up above which I'm going to add to {arrowbench} now 
wouldn't have had this problem. What was happening was: the factors/dicts were 
being converted, and inside of them one of the attributes is (starting with 
this branch) backed by an altrep representation, so when my validation only 
looked for `arrow::array` in the output it found that attribute and declared 
that it had successfully used altrep (when it had not, at least for the bulk of 
the data!). 
   
   Here's the output from `.Internal(inspect(factor_array))` showing that (it 
looks like `levels`) _is_ backed by an altrep string now.
   ```
   @7fe6607e0000 13 INTSXP g0c7 [OBJ,REF(6),ATT] (len=1000000, tl=0) 
19,21,5,15,21,...
   ATTRIB:
    @7fe6590bcc80 02 LISTSXP g0c0 [REF(1)] 
      TAG: @7fe67f8099e0 01 SYMSXP g1c0 [MARK,REF(65535),LCK,gp=0x4000] 
"levels" (has value)
      @7fe6590bccb8 16 STRSXP g0c0 [REF(65535)] arrow::Array<string, 0 nulls> 
len=51, Array=<0x7fe69f41e318>
        @7fe6590bcd28 22 EXTPTRSXP g0c0 [REF(6)] 
      TAG: @7fe67f809dd0 01 SYMSXP g1c0 [MARK,REF(33947),LCK,gp=0x4000] "class" 
(has value)
      @7fe689733048 16 STRSXP g1c1 [MARK,REF(65535)] (len=1, tl=0)
        @7fe69f81aed8 09 CHARSXP g1c1 [MARK,REF(367),gp=0x61] [ASCII] [cached] 
"factor"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to