From what I can see, you are only passing volume.xml to your parser. If I understand your code and questions correctly, the Volume file simply points to the actual articles that you want to parse. Seems like you need to parse the Volume file, get the name/location of the article file and then parse that for entry in Lucene. Or am I mistaken? What does the volume file look like?

Malcolm wrote:

It's XML like this. It has 120-ish volumes with references to 12,107 articles 
which are like this below:
<article>
<fno>A1003</fno>
<doi>10.1041/A1003s-1995</doi>
<fm><hdr><hdr1><ti>IEEE Annals of the History of Computing</ti>
<crt><issn>1058-6180</issn>/95/$4.00 <cci><onm>&copy; 1995 
IEEE</onm></cci></crt></hdr1>
<hdr2><obi><volno>Vol. 17</volno>, <issno>No. 1</issno></obi>
<pdt><mo>Spring</mo><yr>1995</yr></pdt>
<pp>pp. 3-3</pp></hdr2></hdr>
<tig>
<atl>About this Issue</atl><pn>pp. 3-3</pn></tig>
<au 
sequence="first"><fnm>J.A.N.</fnm><snm>Lee</snm><role>Editor&hyphen;in&hyphen;Chief</role></au>
</fm>
<bdy>
<p>The first issue of our 17th volume is as diverse in topics as any nontheme issue that we 
have tried to present over the past many years. However, it still represents the work of the 
English&hyphen;speaking world of the North Atlantic rather than a broader picture of computing in 
the whole world. The Editorial Board and the article editors of the <it>Annals</it>
are doing their best to bring the history of the whole world of computing to our readers, 
but it does require authors in other countries to offer their manuscripts for our 
consideration. Please take this as an open invitation to authors in other parts of the 
world to submit papers to the <it>Annals</it>
for review and help us to follow the lead of our parent organization in being the &ldquo;The 
World&rsquo;s Computer Society.&rdquo;</p>
<p>The five major articles in this issue represent several manuscripts that have been in our files for some time, 
and we are grateful to the authors for having &ldquo;stuck with us&rdquo; while we reviewed, 
re&hyphen;reviewed, and reworked their papers. Articles in the field of history do not always present the work of 
the authors themselves (though we welcome pioneers to give us their own stories, as in the case of the 1935 article by 
John McPherson in this issue); thus, answering the question &ldquo;is it accurate?&rdquo; is not always easy. 
In fact, we ask our referees to answer the following questions about each manuscript, and their responses determine 
whether we accept the manuscript &ldquo;as is&rdquo; or whether we ask the author(s) to revise the 
material:</p>
<l2>
<li>
<p>Are the issues addressed in the paper stated clearly enough?</p></li>
<li>
</bdy></article>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to