benj created DRILL-8481:
---------------------------

             Summary: Ability to query root attributes
                 Key: DRILL-8481
                 URL: https://issues.apache.org/jira/browse/DRILL-8481
             Project: Apache Drill
          Issue Type: Improvement
          Components: Storage - XML
    Affects Versions: 1.21.1
            Reporter: benj


Hi,

It is possible to retrieve the field attributes except those of the root
It would be interesting to be able to retrieve the attributes found in the root 
node of XML files.
In my common use cases, I have many XML files each containing a single XML 
frame with often one or more attributes in the root tag.
To recover this value, I am currently forced to preprocess the files to "copy" 
this attribute into the fields of the XML record.

Even with multiple xml records under the root, it would be useful to consider 
that the root attributes are accessible for each record

Example (fichier aaa.xml): 
{noformat}
<PPP Version="2023-001" TimeStamp="2023-06-09T21:17:14.416+02:00">
<P1 SubVersion="a1" MID="XX003" PN="156" SL="3"/>
<P2 SubVersion="b1"><Color>blue</Color></P2>
</PPP>
{noformat}

With request : 

{code:sql}
SELECT * FROM(SELECT filename, * FROM TABLE(dfs.test.`/aaa.xml`(type=>'xml', 
dataLevel=>1)) as xml) AS x;
{code}

I can access to :
* P1_SubVersion
* P1_MID
* P1_PN
* P1_SL
* P2_SubVersion
* P2.Color

But I can' access to :
* PPP_Version
* PPP_TimeStamp

and changing the DataLevel does not solve the problem

Regards,





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to