jonkeane commented on PR #13070:
URL: https://github.com/apache/arrow/pull/13070#issuecomment-1126151870

   > Initially I started by trying to apply an identical logic to 
as.Date.character(): find the first non-null element and attempt to parse it 
with the first format, then move on to the next format, etc. Computationally it 
should be relatively efficient since we would try the formats on a single 
element, but I couldn't figure out how to do this with expressions (find the 
first non-null, apply the format, force the computation, if NA, try the next 
one, etc).
   
   Your description of how base R accomplishes this is good — and like you 
mention: iterating through data (even to get a single element) might be 
complicate or problematic.
   
   The next step I would take is to explore what functionality is available 
(you seem to have found `IsValid()` which is good. You'll need to look at both 
Arrays and Expressions, since you could get either in a dplyr query like this 
(are there any other object types you might encounter there?). And then I would 
try and construct a way to get the format then apply it. In this exploration 
it's ok to have unreasonable or code that we might not want to ship to see if 
it's possible. This type of data introspection is not common in our bindings, 
but you might look around to see if there are any others that do similar (I 
don't know of any off the top of my head, but it's possible that one exists).
   
   Another approach might be to try other methods of constructing expressions 
that don't use coalesce but produce multiple columns and you select one at the 
end of the expression.
   
   It's entirely possible that this type of introspection + format selection 
will not be feasible in R-only code too. But as always, it's helpful to explore 
and see + have something that ~works even if it's not what we want to create 
tests + implement it in the C++ layer. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to