Hi Ron, I agree that would be helpful. I’ve added a GitHub issue [1].
As you’ve already indicated, you can post-process your databases instances. I think the easiest query for that is: delete nodes db:get('db')//*[empty(node())] …followed by an optional db:optimize('db'). Best, Christian [1] https://github.com/BaseXdb/basex/issues/2203 On Thu, Apr 20, 2023 at 1:06 PM Ron Van den Branden <ron.vdbran...@gmail.com> wrote: > > Hi all, > > I'm investigating a way of analysing a massive set of > 900.000 CSV files, > for which the CSV parsing in BaseX seems very useful, producing a db nicely > filled with documents such as: > > <csv> > <record> > <ResourceID>00003a92-d10e-585e-84a7-29ad17c5799f</ResourceID> > <source.id>bbcy:vev:6860</source.id> > <card>AA</card> > <order>0</order> > <source_field/> > <source_code/> > <Annotation>some remarks</Annotation> > <Annotation_Language>en</Annotation_Language> > <Annotation_Type/> > <resource_model/> > <!-- ... --> > </record> > <record> > <ResourceID>00003a92-d10e-585e-84a7-29ad17c5799f</ResourceID> > <source.id>bbcy:vev:6860</source.id> > <card>BE</card> > <order>0</order> > <source_field/> > <source_code>concept</source_code> > <Annotation/> > <Annotation_Language/> > <Annotation_Type/> > <resource_model/> > <!-- ... --> > </record> > > <!-- ... --> > </csv> > > Yet, when querying those documents, I'm noticing how just selecting non-empty > elements is very slow. For example: > > //source_code[normalize-space()] > > ...can take over 40 seconds. > > Since I don't have control over the source data, it would be really great if > empty cells could be skipped when parsing CSV files. Of course this could be > a trivial post-processing step via XSLT / XQuery, but that's unfeasible for > that mass of data. > > Does BaseX provide a way of telling the CSV parser to skip empty cells? > > Best, > > Ron