westonpace edited a comment on pull request #8984:
URL: https://github.com/apache/arrow/pull/8984#issuecomment-758347141


   @jorisvandenbossche It's pretty close but there are a few differences.
   
   - The pandas code allows the index type to expand (e.g. from uint8_t to 
uint16_t).  In fact, it looks like it always sets it to int32_t.  Also, arrow 
doesn't allow dictionary indices to be negative.
   - The pandas code puts -1 in the map for a null value.  Arrow uses null in 
the validity bitmap for the indices array and/or null as an item in the 
dictionary itself with a valid index (both arrow approaches are legal but the 
pandas approach is neither of those)
   
   I'll defer to @pitrou if we want to combine them but it seems simpler to 
just leave them separate for now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to