I have started to look through this. I think we're going to need to do some 
work on the design of the tokenizer hot path (I wrote the tokenizer that pandas 
uses, for example -- I probably wouldn't use the same design again -- so we 
have other data points to compare with). Luckily we have benchmarks and tests 
so we can refactor at will to try out different things and analyze that part in 
more depth. 

[ Full content available at: https://github.com/apache/arrow/pull/2576 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to