[ 
https://issues.apache.org/jira/browse/NUTCH-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693456#comment-14693456
 ] 

Pradumna Panditrao commented on NUTCH-2079:
-------------------------------------------

Hi,

1.In current case getParse parses url & page. But I want to pass particular 
data etc. if page contains name, age, location etc. So guide for the same.
2. Once I come to know the exact parse contain as per my requirement, I will 
make the same changes index-plugin.
3.Yes, I have added the same into gora-mongodb-mapping.xml

So let me know the same.


Sample code of mine:

Parser parser = new AutoDetectParser();
    Metadata metadata = new Metadata();
   //Phone number extractor
    PhoneExtractingContentHandler handler = new 
PhoneExtractingContentHandler(new BodyContentHandler(), metadata);
    InputStream stream = new FileInputStream(file);
    try {
        parser.parse(stream, handler, metadata, new ParseContext());
    }
    finally {
        stream.close();
    }
    String[] numbers = metadata.getValues("phonenumbers");
    for (String number : numbers) {
        phoneNumbers.add(number);
    }
}




> Tika Parsing plugin issue
> -------------------------
>
>                 Key: NUTCH-2079
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2079
>             Project: Nutch
>          Issue Type: New Feature
>          Components: deployment
>    Affects Versions: 2.3
>         Environment: Ubuntu 14.04
>            Reporter: Pradumna Panditrao
>             Fix For: 2.3
>
>
> Hi,
> I am trying to parse particular data & post the same on the mongodb, however 
> when I am trying to do some modifications into into parse tika plugin, it has 
> too much inter connectivity with other classes & it misses the data. I want 
> to pick up particular data from website using the same plugin & put into 
> mongo db.
> Please suggest for the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to