benj created DRILL-7954:
---------------------------
Summary: XML ability to not concatenate fields and attribute -
change presentation of data
Key: DRILL-7954
URL: https://issues.apache.org/jira/browse/DRILL-7954
Project: Apache Drill
Issue Type: Improvement
Affects Versions: 1.19.0
Reporter: benj
With a XML containing these data :
{noformat}
<a>
<attr>
<set num="0" val="1">x</set>
<set num="1" val="2">y</set>
</attr>
<attr>
<set num="2" val="a">z</set>
<set num="3" val="b">a</set>
</attr>
</a>
{noformat}
{noformat}
apache drill> SELECT * FROM TABLE(dfs.test.`attributetest.xml`(type=>'xml',
dataLevel=>1)) as x;
+-----------------------------------------------+----------------+
| attributes | attr |
+-----------------------------------------------+----------------+
| {"attr_set_num":"0123","attr_set_val":"12ab"} | {"set":"xyza"} |
+-----------------------------------------------+----------------+
SELECT * FROM TABLE(dfs.test.`attributetest.xml`(type=>'xml', dataLevel=>2)) as
x;
+---------------------------------+-----+
| attributes | set |
+---------------------------------+-----+
| {"set_num":"01","set_val":"12"} | xy |
| {"set_num":"23","set_val":"ab"} | za |
+---------------------------------+-----+
apache drill> SELECT * FROM TABLE(dfs.test.`attributetest.xml`(type=>'xml',
dataLevel=>3)) as x;
+------------+
| attributes |
+------------+
| {} |
| {} |
| {} |
| {} |
+------------+
{noformat}
Attributes and fields with the same name are concatenated and remains
inexploitable _(maybe the posibility of adding separator should help but it's
not the point here)_
In fact that we really need is the ability to obtain something like _(depending
of the defining level)_ :
{noformat}
+----------------------------------------------------------------------------------+
| attr
|
+----------------------------------------------------------------------------------+
|
[{"set":"x","_attributes":{"num":"0","val":"1"}},{"set":"y","_attributes":{"num":"1","val":"2"}}]
|
|
[{"set":"z","_attributes":{"num":"2","val":"a"}},{"set":"a","_attributes":{"num":"3","val":"b"}}]
|
+----------------------------------------------------------------------------------+
+------------------------------------------------+
| set |
+------------------------------------------------+
| {"set":"x","_attributes":{"num":"0","val":"1"}} |
| {"set":"y","_attributes":{"num":"1","val":"2"}} |
| {"set":"z","_attributes":{"num":"2","val":"a"}} |
| {"set":"a","_attributes":{"num":"3","val":"b"}} |
+------------------------------------------------+
{noformat}
_attributes fields could be generated on each level instead of generated with
path from top level => that will allow to work with data from each level
without losing information
--
This message was sent by Atlassian Jira
(v8.3.4#803005)