jonkeane commented on PR #13070: URL: https://github.com/apache/arrow/pull/13070#issuecomment-1126151870
> Initially I started by trying to apply an identical logic to as.Date.character(): find the first non-null element and attempt to parse it with the first format, then move on to the next format, etc. Computationally it should be relatively efficient since we would try the formats on a single element, but I couldn't figure out how to do this with expressions (find the first non-null, apply the format, force the computation, if NA, try the next one, etc). Your description of how base R accomplishes this is good — and like you mention: iterating through data (even to get a single element) might be complicate or problematic. The next step I would take is to explore what functionality is available (you seem to have found `IsValid()` which is good. You'll need to look at both Arrays and Expressions, since you could get either in a dplyr query like this (are there any other object types you might encounter there?). And then I would try and construct a way to get the format then apply it. In this exploration it's ok to have unreasonable or code that we might not want to ship to see if it's possible. This type of data introspection is not common in our bindings, but you might look around to see if there are any others that do similar (I don't know of any off the top of my head, but it's possible that one exists). Another approach might be to try other methods of constructing expressions that don't use coalesce but produce multiple columns and you select one at the end of the expression. It's entirely possible that this type of introspection + format selection will not be feasible in R-only code too. But as always, it's helpful to explore and see + have something that ~works even if it's not what we want to create tests + implement it in the C++ layer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
