Chris, Rida,

Here the changes that I have made to XMLParseConfig.java in the
populateConfig(Document doc) method:


if (elemNode.getAttribute("nodeXpath") != null) {
                                        String nodeXpath =
elemNode.getAttributeValue("namespace");
                                        xip.setNodeXpath(nodeXpath);
                                }
                                List fieldList = XPath.selectNodes(elemNode,
"field");
                                
                                if(fieldList != null) // modified 20062011
by Armel
                                {
                                for (int j = 0; j < fieldList.size(); j++) {
                                        Element elem = (Element)
fieldList.get(j);
                                        XMLField xf =
populateXMLField(elem);
                                        fieldsColl.add(xf);
                                }
                                }
                                
                                /*
                                 * modifiied by Armel
                                 * 20062011
                                 * if fieldList is empty because it doesn't
contain
                                 * an element "field"
                                 */
                                if(fieldList == null){
                                       XMLField xf =
populateXMLField(elemNode);
                                        fieldsColl.add(xf);
                                    }

And the populateXMLField(Element el) method:

if (elem.getAttribute("name") != null) 
                        xf.setFieldName(elem.getAttributeValue("name"));

                if(elem.getAttribute("name")== null)// modified by Armel
                {
                    List att = elem.getAttributes();
                    if(att != null){ // modified by Armel - loop and create
field accondingly
                        for (int i = 0; i < att.size(); i++){
                           Attribute at = (Attribute)att.get(i);
 
xf.setFieldName(elem.getAttributeValue(at.getName()));
                        }
                }
                if (elem.getAttribute("xpath") != null)
                        xf.setFieldXPath(elem.getAttributeValue("xpath"));

this is supposed to do the feature I want to implement, please advise.

Armel

-----Original Message-----
From: Chris Mattmann [mailto:[EMAIL PROTECTED] 
Sent: 20 November 2006 23:30
To: [email protected]
Subject: Re: What's the status of Nutch-GUI?

Hi Armel,

On 11/20/06 1:44 PM, "Armel T. Nene" <[EMAIL PROTECTED]> wrote:

> Hi Chris,
> 
> I am trying to extend parse-xml to enable the creation of lucene fields
> straight from an xml file. For example, a database table that has been
parse
> as an XML file should be stored in the index with the relevant fields,
i.e.
> customer name, address and so on. This file will not have a namespace
> associated with it and should not be stored as "xmlcontent" in the
database.
> Currently, parse-xml looks for known fields in the document and stores the
> associated values with the field name. I have added an extra conditions as
> if the known fields are not present in the current document, the element
or
> node in the document should be the new field stored in the index with
their
> value.

I think that this is fine.
> 
> Therefore, when parse-xml receives an xml document with no namespace
> available, it will parse the document and store it element name as new
field
> in the index and the element associated value.
> 
> Let me know if I am on the right track because I know I don't have to
write
> a separate plugin for this feature but just extending ( or modifying)
> parse-xml.

I think that parse-xml will support what you are talking about. In terms of
the "check" that you are doing to see if a field exists or not before adding
another value for it in the index, as I understood Lucene, I believe that
you could just omit this check and add the field regardless. If you add
multiple values for the same field in a Document, e.g:

<snip>
Document doc = new Document();

doc.add(new Field("fieldname", "fieldvalue", ...));
doc.add(new Field("fieldname", "fieldvalue2",...));

</snip>

Both the values "fieldvalue" and "fieldvalue2" will both get stored in the
index for the key "fieldname". So, if I understand you correctly (which I
may not ;) ), then I think you can omit the check that you are talking about
above and just go with adding the same field name 2x.

HTH,
  Chris

> 
> Cheers,
> 
> Armel
> 
> 
> -----Original Message-----
> From: Chris Mattmann [mailto:[EMAIL PROTECTED]
> Sent: 20 November 2006 18:40
> To: [email protected]
> Subject: Re: What's the status of Nutch-GUI?
> 
> Hi Sami and Scott,
> 
>  This is on my TO-DO list as one of the items that I will begin working on
> getting into the sources as a committer. Additionally, I plan on
integrating
> and testing the parse-xml plugin into the source tree. As soon as I get my
> Apache account and SVN access, I will start working on this.
> 
> Thanks!
> 
> Cheers,
>   Chris
> 
> 
> 
> On 11/20/06 9:24 AM, "Sami Siren" <[EMAIL PROTECTED]> wrote:
> 
>> scott green wrote:
>>> Hi
>>> 
>>> Is nutch-gui dead? why i cannot find any source in svn repo?
>> 
>> Unfortunately the sources for the admin gui never got into svn. It would
>> be great if someone could pick it up and bring it up to date to get it
>> integrated.
>> 
>> --
>>   Sami Siren
>> 
> 
> 
> 
> 

______________________________________________
Chris A. Mattmann
[EMAIL PROTECTED]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group

_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                        Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.




Reply via email to