[
https://issues.apache.org/jira/browse/SPARK-16203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552155#comment-16552155
]
Nick Nicolini commented on SPARK-16203:
---------------------------------------
Cool, added ticket here:https://issues.apache.org/jira/browse/SPARK-24884
I think the above is the same feature that [~mmoroz] was asking for, so IMO we
close this ticket in favor of the newer one.
> regexp_extract to return an ArrayType(StringType())
> ---------------------------------------------------
>
> Key: SPARK-16203
> URL: https://issues.apache.org/jira/browse/SPARK-16203
> Project: Spark
> Issue Type: Improvement
> Affects Versions: 2.0.0
> Reporter: Max Moroz
> Priority: Minor
>
> regexp_extract only returns a single matched group. If (as if often the case
> - e.g., web log parsing) we need to parse the entire line and get all the
> groups, we'll need to call it as many times as there are groups.
> It's only a minor annoyance syntactically.
> But unless I misunderstand something, it would be very inefficient. (How
> would Spark know not to do multiple pattern matching operations, when only
> one is needed? Or does the optimizer actually check whether the patterns are
> identical, and if they are, avoid the repeated regex matching operations??)
> Would it be possible to have it return an array when the index is not
> specified (defaulting to None)?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]