[ 
https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saad Patel updated PIG-3619:
----------------------------

    Description: 
Xml is often loaded using XMLLoader with a record boundary tag as one of the 
parameters. A common use case is to then extract data from those records. XPath 
would allow those extractions to be done very easily. I'm  proposing a patch 
that adds simple XPath support as a UDF.

Example usage of this the XPath UDF would be:

{code}
extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), 
XPath(record, 'book/title');
{code}

The proposed UDF also caches the last xml document. This is helpful for 
improving performance when multiple consecutive xpath extractions on the same 
xml document, such as the example above. 

  was:
Xml is often loaded using XMLLoader with a record boundary tag as one of the 
parameters. A common use case is to then extract data from those records. XPath 
would allow those extractions to be done very easily. I'm  proposing a patch 
that adds simple XPath support as a UDF.

Example usage of this the XPath UDF would be:

{code}
extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), 
XPath(record, 'book/title');
{code}


> Provide XPath function
> ----------------------
>
>                 Key: PIG-3619
>                 URL: https://issues.apache.org/jira/browse/PIG-3619
>             Project: Pig
>          Issue Type: Improvement
>          Components: piggybank
>            Reporter: Saad Patel
>         Attachments: xpath.patch
>
>
> Xml is often loaded using XMLLoader with a record boundary tag as one of the 
> parameters. A common use case is to then extract data from those records. 
> XPath would allow those extractions to be done very easily. I'm  proposing a 
> patch that adds simple XPath support as a UDF.
> Example usage of this the XPath UDF would be:
> {code}
> extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), 
> XPath(record, 'book/title');
> {code}
> The proposed UDF also caches the last xml document. This is helpful for 
> improving performance when multiple consecutive xpath extractions on the same 
> xml document, such as the example above. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to