So I have a requirement where I have a directory filled with xml files. I wrote a parser to parse these files, and index all of the xml attributes and properties into documents. An example of one of these documents is below. I'm parsing sentences into words, and tagging the sentences based on certain criteria.
My issue is trying to find out if lucene can handle cross-document searching. So below is indexed as a single document... and there will be multiple sentences before, after, and throughout an entire transcript. Is it possible somehow to say, "I want a result where one line marked as Symptom is 5 lines away from another line marked as Brand." So in essence, I'm trying to search across multiple lucene documents. Any thoughts or literature out there? <transcript> <line id="1"> <tag id="10" type="Symptom" /> <tag id="12" type="Brand" /> <word> <token>Coughing</token> <part-of-speech>SBJ</part-of-speech> </word> <word> <token>is</token> <part-of-speech>VB</part-of-speech> </word> <word> <token>caused</token> <part-of-speech>NP</part-of-speech> </word> <word> <token>by</token> <part-of-speech>PP</part-of-speech> </word> <word> <token>Mucinex</token> <part-of-speech>PDC</part-of-speech> </word> </line> </transcript> Thanks so much!