I stand corrected... It does not look like the XML reader has any support for arrays. -- C
> On Aug 15, 2023, at 12:01 AM, Paul Rogers <par0...@gmail.com> wrote: > > IIRC, the syntax for the "provided schema" for arrays is "ARRAY<type>" such > as "ARRAY<DOUBLE>". This works, however, only if the XML reader uses the > (very complex) EVF framework and has a way to control parsing based on the > data type (and to set the data type based on parsing). The JSON reader has > such an integration. Charles, did you do the work to add that kind of > dynamic state machine to the XML parser? > > - Paul > > On Mon, Aug 14, 2023 at 6:28 PM Charles Givre <cgi...@gmail.com> wrote: > >> Hi Mike, >> It is theoretically possible but I don't have an example of the syntax. >> As you've probably figured out, Drill vectors have both a type and data >> mode. The mode is either NULLABLE or REPEATED if I remember correctly. >> Thus, you could tell Drill via the inline schema that the data mode for a >> given field is REPEATED and that would be the Drill equivalent of an >> Array. I've never actually done this, so I don't really know if it would >> work for inline schemata but I'd assume that it would. >> >> I'll do some digging to see whether I have any examples of this. >> Best, >> --C >> >> >> >> >> >>> On Aug 14, 2023, at 3:36 PM, Mike Beckerle <mbecke...@apache.org> wrote: >>> >>> I'm trying to get my Drill SQL queries to produce the right thing from >> XML. >>> >>> A major thing that you can't easily infer from looking at just XML data >> is >>> what is an array. XML lacks an array starting indicator. >>> >>> Is there an inline schema notation in the Drill Query language for >>> array-ness, so that one can inform Drill what is an array? >>> >>> For example this provides simple types for all the fields directly in the >>> query. >>> >>> @Test >>> >>> public void testSimpleProvidedSchema() throws Exception { >>> >>> String sql = "SELECT * FROM table(cp.`xml/simple_with_datatypes.xml` >>> (type => 'xml', schema " + >>> >>> "=> 'inline=(`int_field` INT, `bigint_field` BIGINT, `float_field` >>> FLOAT, `double_field` DOUBLE, `boolean_field` " + >>> >>> "BOOLEAN, `date_field` DATE, `time_field` TIME, `timestamp_field` >>> TIMESTAMP, `string_field`" + >>> >>> " VARCHAR, `date2_field` DATE properties {`drill.format` = >>> `MM/dd/yyyy`})'))"; >>> >>> RowSet results = client.queryBuilder().sql(sql).rowSet(); >>> >>> assertEquals(2, results.rowCount()); >>> >>> >>> Can one also tell Drill what fields or child elements are arrays? >> >>
signature.asc
Description: Message signed with OpenPGP