Hi Christian,

Thank you so much for your quick answer! The scripts you give both work 
efficiently on my end! I actually forgot about the use of pragmas, but I tried 
to force the use of indexes by specifying data()/text nodes, but they did not 
work. On the contrary, I remembered that maps can “do the trick", so I first 
converted XML into JSON, and then tried to merge the files, but it did not work 
either. If it is of interest to you, I uploaded the files and query here:

script:    
https://git.informatik.uni-leipzig.de/celano/perseus_morpheus/-/blob/master/join_json_files.xq
 
<https://git.informatik.uni-leipzig.de/celano/perseus_morpheus/-/blob/master/join_json_files.xq>
1st file:  
https://git.informatik.uni-leipzig.de/celano/perseus_morpheus/-/blob/master/hib_parses.json
 
<https://git.informatik.uni-leipzig.de/celano/perseus_morpheus/-/blob/master/hib_parses.json>
2nd file: 
https://git.informatik.uni-leipzig.de/celano/perseus_morpheus/-/blob/master/hib_lemmas.json
 
<https://git.informatik.uni-leipzig.de/celano/perseus_morpheus/-/blob/master/hib_lemmas.json>

Thanks again for your help!

Ciao,
Giuseppe


> On 12. Jul 2020, at 15:46, Christian Grün <christian.gr...@gmail.com> wrote:
> 
> One more solution that should be evaluated faster (the data to be
> looked up is directly stored in a map):
> 
> declare variable $hib_parses:= db:open('hib_parses');
> declare variable $hib_lemmas := db:open('hib_lemmas');
> 
> let $lemmas := map:merge(
>  for $row in $hib_lemmas//row
>  where $row/field[@name = 'lemma_lang_id'] = '3'
>  return map:entry($row/field[@name = 'lemma_id'], $row)
> , map { 'duplicates': 'combine'})
> 
> for $parse in $hib_parses//row
> for $lemma in $lemmas($parse/field[@name = 'lemma_id'])
> return (# db:copynode false #) {
>  element wf  {
>    <f>{ $parse/* }</f>,
>    <l>{ $lemma/* }</l>
>  }
> }
> 
> 
> 
> On 7/11/20, Giuseppe G. A. Celano <cel...@informatik.uni-leipzig.de> wrote:
>> Hi,
>> 
>> I am trying to perform a join operation between two large XML files (~490 MB
>> and ~40 MB), which are the result of the automatic conversion of old sql
>> dumps into XML files. I created two databases for the files. The query I
>> wrote to join them is correct because it works when I limit the join to just
>> a few items, but it never ends if I apply it to all items:
>> 
>> here is the xquery:
>> https://git.informatik.uni-leipzig.de/celano/perseus_morpheus/-/blob/master/join_files.xq
>> <https://git.informatik.uni-leipzig.de/celano/perseus_morpheus/-/blob/master/join_files.xq>
>> here is the first file:
>> https://git.informatik.uni-leipzig.de/celano/perseus_morpheus/-/blob/master/hib_parses.xml
>> <https://git.informatik.uni-leipzig.de/celano/perseus_morpheus/-/blob/master/hib_parses.xml>
>> here is the second file:
>> https://git.informatik.uni-leipzig.de/celano/perseus_morpheus/-/blob/master/hib_lemmas.xml
>> <https://git.informatik.uni-leipzig.de/celano/perseus_morpheus/-/blob/master/hib_lemmas.xml>
>> 
>> I have also tried to use the database module functions, but without success.
>> Am I missing anything here? Thanks.
>> 
>> Ciao,
>> Giuseppe

Reply via email to