Hi Greg, So, to be clear and succinct, the goal is to create a single XML file containing all XML files that have a predefined text string match in them, yes?
If so, I'm wondering if creating any database is necessary. A single pass through all the files, searching for the text string, and appending matched files as you go seems sufficient. R On Mon, Aug 5, 2019, 08:42 Greg Kawchuk <greg.kawc...@ualberta.ca> wrote: > Hi everyone, > I'm wondering if someone could provide what I think is a brief script for > a scientific project to do the following. > The goal is to identify XML documents from a very large collection that > would be too big to load into a database all at once. > > Here is how I see the functions provided by the code. > 1. In the script, the user could enter the path of the target folder (with > millions of XML documents). > 2. In the script, the user would enter the number of documents to load > into a database at a given time (i =. 1,000) depending on memory > limitations. > 3. The code would then create a temporary database from the first (i) xml > files in the target folder. > 4. The code would then search the 1000 xml documents in the database for a > pre-defined text string. > 5. If hits exist for the text string, the code would write those documents > to a unique XML file. > 6. Clear the database. > 7. Read in the next 1000 files (or remaining files in the folder). > 8. Return to #4. > > There would be no need to append XML files in step 5. The resulting XML > files could be concatenated afterwards. > Thank you in advance. If you have any questions, please feel free to email > me here. > Greg > > *************************************************** > Greg Kawchuk BSC, DC, MSc, PhD. > Professor, Faculty of Rehabilitation Medicine > University of Alberta > greg.kawc...@ualberta.ca > 780-492-6891 >