I stand corrected...  It does not look like the XML reader has any support for 
arrays.
-- C

> On Aug 15, 2023, at 12:01 AM, Paul Rogers <par0...@gmail.com> wrote:
> 
> IIRC, the syntax for the "provided schema" for arrays is "ARRAY<type>" such
> as "ARRAY<DOUBLE>". This works, however, only if the XML reader uses the
> (very complex) EVF framework and has a way to control parsing based on the
> data type (and to set the data type based on parsing). The JSON reader has
> such an integration. Charles, did you do the work to add that kind of
> dynamic state machine to the XML parser?
> 
> - Paul
> 
> On Mon, Aug 14, 2023 at 6:28 PM Charles Givre <cgi...@gmail.com> wrote:
> 
>> Hi Mike,
>> It is theoretically possible but I don't have an example of the syntax.
>> As you've probably figured out, Drill vectors have both a type and data
>> mode.  The mode is either NULLABLE or REPEATED if I remember correctly.
>> Thus, you could tell Drill via the inline schema that the data mode for a
>> given field is REPEATED and that would be the Drill equivalent of an
>> Array.  I've never actually done this, so I don't really know if it would
>> work for inline schemata but I'd assume that it would.
>> 
>> I'll do some digging to see whether I have any examples of this.
>> Best,
>> --C
>> 
>> 
>> 
>> 
>> 
>>> On Aug 14, 2023, at 3:36 PM, Mike Beckerle <mbecke...@apache.org> wrote:
>>> 
>>> I'm trying to get my Drill SQL queries to produce the right thing from
>> XML.
>>> 
>>> A major thing that you can't easily infer from looking at just XML data
>> is
>>> what is an array. XML lacks an array starting indicator.
>>> 
>>> Is there an inline schema notation in the Drill Query language for
>>> array-ness, so that one can inform Drill what is an array?
>>> 
>>> For example this provides simple types for all the fields directly in the
>>> query.
>>> 
>>> @Test
>>> 
>>> public void testSimpleProvidedSchema() throws Exception {
>>> 
>>> String sql = "SELECT * FROM table(cp.`xml/simple_with_datatypes.xml`
>>> (type => 'xml', schema " +
>>> 
>>>   "=> 'inline=(`int_field` INT, `bigint_field` BIGINT, `float_field`
>>> FLOAT, `double_field` DOUBLE, `boolean_field` " +
>>> 
>>>   "BOOLEAN, `date_field` DATE, `time_field` TIME, `timestamp_field`
>>> TIMESTAMP, `string_field`" +
>>> 
>>>   " VARCHAR, `date2_field` DATE properties {`drill.format` =
>>> `MM/dd/yyyy`})'))";
>>> 
>>> RowSet results = client.queryBuilder().sql(sql).rowSet();
>>> 
>>> assertEquals(2, results.rowCount());
>>> 
>>> 
>>> Can one also tell Drill what fields or child elements are arrays?
>> 
>> 

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to