[ 
https://issues.apache.org/jira/browse/HIVE-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797520#action_12797520
 ] 

Patrick Angeles commented on HIVE-1027:
---------------------------------------


1) In general XPath queries return a list of nodes. What is the semantics of 
xpath_double (eg.) return if XPath evaluates to multiple nodes. 

Only xpath() returns multiple nodes (list).

xpath_string() returns the text of the first matching node (and its subnodes, 
if any).
- xpath_string('<a>aa<b>b1</b><b>b2</b></a>','a') returns 'aab1b2'
- xpath_string('<a>aa<b>b1</b><b>b2</b></a>','b') returns 'b1'

xpath_double()/float() return the numeric value of the text of the first 
matching node, or NaN if the text value is not numeric.
xpath_int()/long()/short() return the numberic value of the text of the first 
matching node, or 0 if the text value is not numeric, or MAX_INT, MAX_LONG, 
MAX_SHORT respectively if the value overflows.

2) Is the XPath query parsed for every input row, or only parsed once?

The XPath expression is compiled and cached. It is reused if the next 
expression matches the previous. Otherwise, it is recompiled. So, the xml is 
always parsed for every input row, but the xpath expression is precompiled and 
reused for the vast majority of use cases.

3a) Do you support DTD and XMLSchema?

Not sure how these would apply, as the Java XPath API is schema agnostic (no 
validation being performed). However, malformed xml (e.g., '<a><b>1</b></aa>') 
will result in a runtime exception being thrown.

3b) What about namespace and backward axes in XPath?

Namespace is not currently supported, but could be easily added later.

Backward axes are supported:

> select xpath ('<a><b id="1"><c/></b><b 
> id="2"><c/></b></a>','/descendant::c/ancestor::b/@id') from t1 limit 1 ;
["1","2"]

4) If XPath evaluates to empty list, do you return NULL or empty string (in 
case of xpath())?

When no match is found:
xpath()  returns an empty list.
xpath_string() returns an empty string.
xpath_int(), float(), etc. will return 0.
xpath_boolean() will return false.

> Create UDFs for XPath expression evaluation
> -------------------------------------------
>
>                 Key: HIVE-1027
>                 URL: https://issues.apache.org/jira/browse/HIVE-1027
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Patrick Angeles
>            Assignee: Patrick Angeles
>            Priority: Minor
>         Attachments: hive-1027.patch, udf_xpath.patch
>
>
> Create UDFs for evaluating XPath expressions against XML documents.
> Examples:
> > SELECT xpath_double ('<a><b class="odd">1</b><b class="even">2</b><b 
> > class="odd">4</b><c>8</c></a>', 'sum(a/b...@class="odd"])') FROM src LIMIT 
> > 1 ;
> 5.0
> > SELECT xpath_string ('<a><b>b1</b><b>b2</b></a>', 'a/b[2]') FROM src LIMIT 
> > 1 ;
> b2
> > SELECT xpath ('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>', 
> > 'a/c/text()') FROM src LIMIT 1 ;
> ["c1","c2"]
> Included functions are: xpath_short, xpath_int, xpath_long, xpath_float, 
> xpath_double/xpath_number, xpath_string, xpath

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to