xhochy commented on pull request #7357:
URL: https://github.com/apache/arrow/pull/7357#issuecomment-642456798


   > * should Jira issues and PR's be 1 tot 1, or can I open multiple PR's 
referencing the same Jira issue
   
   At least one JIRA pr PR, everything should have an issue number. If there 
isn't a need for more explanation, a title with correct tagging is enough. It 
would be helpful to have such JIRAs also to coordinate who is working which 
task.
   
   > * 1 commit per PR, or multiple commits per PR if related, but each commit 
one idea/topic/functionlity?
   
   As many commits as it makes sense to review, we will squash on merge. The 
important detail in the past that was annoying (also can be helpful) is that 
reviewers will not be notified for changes if you force-push.
   
   
   > I have to dive a bit into how the kernels work, but with the current 
implementation of ascii_lower and upper I see performance issues (we should be 
able to write into a pre-allocated buffer, since we know the byte length). I'm 
happy first getting some basic functionality in, and optimize later, it will 
surface the issues around the framework a bit more I think.
   
   I'm also quite concerned about this performance penalty. But for e.g. 
`utf8_upper` (let's just call it `upper`), I expect that we cannot pre-allocate 
and thus these are the things where we could continue to as-is.
   
   I'm happy to merge this PR (with a new JIRA id), we should though work on 
writing into pre-allocated memory before writing more kernels that would 
greatly benefit from pre-allocation. For optimization in the pre-allocated 
case, this should be a sufficient baseline. I'm more concerned about 
performance when we start to implement UTF-8 based functionality as we quickly 
get to the point where we need to look at multiple bytes at a time but don't 
know ahead whether it's 1, 2, 3, or 4. There we probably should implement 
kernels with decent performance and one we have a set of 5-10, we can look on 
how to improve performance there.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to