Hi Armel,
On 11/20/06 1:44 PM, "Armel T. Nene" <[EMAIL PROTECTED]> wrote:
> Hi Chris,
>
> I am trying to extend parse-xml to enable the creation of lucene fields
> straight from an xml file. For example, a database table that has been parse
> as an XML file should be stored in the index with the relevant fields, i.e.
> customer name, address and so on. This file will not have a namespace
> associated with it and should not be stored as "xmlcontent" in the database.
> Currently, parse-xml looks for known fields in the document and stores the
> associated values with the field name. I have added an extra conditions as
> if the known fields are not present in the current document, the element or
> node in the document should be the new field stored in the index with their
> value.
I think that this is fine.
>
> Therefore, when parse-xml receives an xml document with no namespace
> available, it will parse the document and store it element name as new field
> in the index and the element associated value.
>
> Let me know if I am on the right track because I know I don't have to write
> a separate plugin for this feature but just extending ( or modifying)
> parse-xml.
I think that parse-xml will support what you are talking about. In terms of
the "check" that you are doing to see if a field exists or not before adding
another value for it in the index, as I understood Lucene, I believe that
you could just omit this check and add the field regardless. If you add
multiple values for the same field in a Document, e.g:
<snip>
Document doc = new Document();
doc.add(new Field("fieldname", "fieldvalue", ...));
doc.add(new Field("fieldname", "fieldvalue2",...));
</snip>
Both the values "fieldvalue" and "fieldvalue2" will both get stored in the
index for the key "fieldname". So, if I understand you correctly (which I
may not ;) ), then I think you can omit the check that you are talking about
above and just go with adding the same field name 2x.
HTH,
Chris
>
> Cheers,
>
> Armel
>
>
> -----Original Message-----
> From: Chris Mattmann [mailto:[EMAIL PROTECTED]
> Sent: 20 November 2006 18:40
> To: [email protected]
> Subject: Re: What's the status of Nutch-GUI?
>
> Hi Sami and Scott,
>
> This is on my TO-DO list as one of the items that I will begin working on
> getting into the sources as a committer. Additionally, I plan on integrating
> and testing the parse-xml plugin into the source tree. As soon as I get my
> Apache account and SVN access, I will start working on this.
>
> Thanks!
>
> Cheers,
> Chris
>
>
>
> On 11/20/06 9:24 AM, "Sami Siren" <[EMAIL PROTECTED]> wrote:
>
>> scott green wrote:
>>> Hi
>>>
>>> Is nutch-gui dead? why i cannot find any source in svn repo?
>>
>> Unfortunately the sources for the admin gui never got into svn. It would
>> be great if someone could pick it up and bring it up to date to get it
>> integrated.
>>
>> --
>> Sami Siren
>>
>
>
>
>
______________________________________________
Chris A. Mattmann
[EMAIL PROTECTED]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group
_________________________________________________
Jet Propulsion Laboratory Pasadena, CA
Office: 171-266B Mailstop: 171-246
_______________________________________________________
Disclaimer: The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.