This is awesome Mark, thanks!

This will be quite useful for everybody else as well.

I ended up doing mine and I went further with the other part of extraction.

What I found interesting is the time it takes to load the
model en-parser-chunking.bin which is about 36mb.

So I am not loading everytime but just during object creation.

Anyone has another better suggestion?

cheers.


On Thu, Sep 26, 2013 at 4:59 PM, Mark G <[email protected]> wrote:

> Carlos.. I threw this together to show how to get a Parser running.
> Look at what this prints, I think you may be able to iterate through
> topParses[] and traverse the tree. If there is a more efficient way I am
> sure the other OpenNLPers will chime in.
>
>
>   public static void main(String[] args) throws InvalidFormatException,
> IOException {
>
>     InputStream is = new
> FileInputStream("c:\\temp\\opennlpmodels\\en-parser-chunking.bin");
>
>     ParserModel model = new ParserModel(is);
>     is.close();
>     Parser parser = ParserFactory.create(model);
>
>     String sentence = "The countries broke off peace talks following the
> Mumbai attacks but have begun discussions again, focusing on increasing
> trade.";
>     Parse topParses[] = ParserTool.parseLine(sentence, parser, 1);
>
>     Parse p = topParses[0];
>     p.showCodeTree();
>     p.show();
>     p.getParent();
>     p.getChildren();
>
>
>     System.out.println(p.getText());
>   }
>
> It should print all this...
>
> [0] S 2092924121 -> 2092924121 TOP The countries broke off peace talks
> following the Mumbai attacks but have begun discussions again, focusing on
> increasing trade.
> [0.0] NP 2092766686 -> 2092924121 S The countries
> [0.0.0] DT 2092752996 -> 2092766686 NP The
> [0.0.0.0] TK 2092752996 -> 2092752996 DT The
> [0.0.1] NNS 2092969298 -> 2092766686 NP countries
> [0.0.1.0] TK 2092969298 -> 2092969298 NNS countries
> [0.1] VP 2093633263 -> 2092924121 S broke off peace talks following the
> Mumbai attacks but have begun discussions again, focusing on increasing
> trade.
> [0.1.0] VP 2093545647 -> 2093633263 VP broke off peace talks following the
> Mumbai attacks
> [0.1.0.0] VBD 2093484042 -> 2093545647 VP broke
> [0.1.0.0.0] TK 2093484042 -> 2093484042 VBD broke
> [0.1.0.1] PRT 2093793436 -> 2093545647 VP off
> [0.1.0.1.0] RP 2093793436 -> 2093793436 PRT off
> [0.1.0.1.0.0] TK 2093793436 -> 2093793436 RP off
> [0.1.0.2] NP 2094012476 -> 2093545647 VP peace talks
> [0.1.0.2.0] NN 2094004262 -> 2094012476 NP peace
> [0.1.0.2.0.0] TK 2094004262 -> 2094004262 NN peace
> [0.1.0.2.1] NNS 2094316394 -> 2094012476 NP talks
> [0.1.0.2.1.0] TK 2094316394 -> 2094316394 NNS talks
> [0.1.0.3] PP 2094660013 -> 2093545647 VP following the Mumbai attacks
> [0.1.0.3.0] VBG 2094634002 -> 2094660013 PP following
> [0.1.0.3.0.0] TK 2094634002 -> 2094634002 VBG following
> [0.1.0.3.1] NP 2095166543 -> 2094660013 PP the Mumbai attacks
> [0.1.0.3.1.0] DT 2095146008 -> 2095166543 NP the
> [0.1.0.3.1.0.0] TK 2095146008 -> 2095146008 DT the
> [0.1.0.3.1.1] NNP 2095358203 -> 2095166543 NP Mumbai
> [0.1.0.3.1.1.0] TK 2095358203 -> 2095358203 NNP Mumbai
> [0.1.0.3.1.2] NNS 2095723726 -> 2095166543 NP attacks
> [0.1.0.3.1.2.0] TK 2095723726 -> 2095723726 NNS attacks
> [0.1.1] CC 2096134426 -> 2093633263 VP but
> [0.1.1.0] TK 2096134426 -> 2096134426 CC but
> [0.1.2] VP 2096419178 -> 2093633263 VP have begun discussions again,
> focusing on increasing trade.
> [0.1.2.0] VBP 2096343883 -> 2096419178 VP have
> [0.1.2.0.0] TK 2096343883 -> 2096343883 VBP have
> [0.1.2.1] VP 2096672443 -> 2096419178 VP begun discussions again, focusing
> on increasing trade.
> [0.1.2.1.0] VBN 2096605362 -> 2096672443 VP begun
> [0.1.2.1.0.0] TK 2096605362 -> 2096605362 VBN begun
> [0.1.2.1.1] NP 2096925708 -> 2096672443 VP discussions
> [0.1.2.1.1.0] NNS 2096925708 -> 2096925708 NP discussions
> [0.1.2.1.1.0.0] TK 2096925708 -> 2096925708 NNS discussions
> [0.1.2.1.2] PP 2097584197 -> 2096672443 VP again, focusing on increasing
> trade.
> [0.1.2.1.2.0] IN 2097543127 -> 2097584197 PP again,
> [0.1.2.1.2.0.0] TK 2097543127 -> 2097543127 IN again,
> [0.1.2.1.2.1] S 2097938768 -> 2097584197 PP focusing on increasing trade.
> [0.1.2.1.2.1.0] VP 2097938768 -> 2097938768 S focusing on increasing trade.
> [0.1.2.1.2.1.0.0] VBG 2097910019 -> 2097938768 VP focusing
> [0.1.2.1.2.1.0.0.0] TK 2097910019 -> 2097910019 VBG focusing
> [0.1.2.1.2.1.0.1] PP 2098394645 -> 2097938768 VP on increasing trade.
> [0.1.2.1.2.1.0.1.0] IN 2098370003 -> 2098394645 PP on
> [0.1.2.1.2.1.0.1.0.0] TK 2098370003 -> 2098370003 IN on
> [0.1.2.1.2.1.0.1.1] NP 2098546604 -> 2098394645 PP increasing trade.
> [0.1.2.1.2.1.0.1.1.0] VBG 2098537021 -> 2098546604 NP increasing
> [0.1.2.1.2.1.0.1.1.0.0] TK 2098537021 -> 2098537021 VBG increasing
> [0.1.2.1.2.1.0.1.1.1] NN 2099103787 -> 2098546604 NP trade.
> [0.1.2.1.2.1.0.1.1.1.0] TK 2099103787 -> 2099103787 NN trade.
> (TOP (S (NP (DT The) (NNS countries)) (VP (VP (VBD broke) (PRT (RP off))
> (NP (NN peace) (NNS talks)) (PP (VBG following) (NP (DT the) (NNP Mumbai)
> (NNS attacks)))) (CC but) (VP (VBP have) (VP (VBN begun) (NP (NNS
> discussions)) (PP (IN again,) (S (VP (VBG focusing) (PP (IN on) (NP (VBG
> increasing) (NN trade.)))))))))))
> The countries broke off peace talks following the Mumbai attacks but have
> begun discussions again, focusing on increasing trade
>
> let me know how it works
>
> happy coding!
>
> Mark G
>
>
>
> On Thu, Sep 26, 2013 at 4:14 PM, Carlos Scheidecker <[email protected]
> >wrote:
>
> > Thanks Svetoslav,
> >
> > Would you have an example on that?
> >
> > cheers,
> >
> > Carlos.
> >
> >
> > On Thu, Sep 26, 2013 at 5:09 AM, Svetoslav Marinov <
> > [email protected]> wrote:
> >
> > > Hi Carlos,
> > >
> > > This is not exactly answer to your question but I am not really
> convinced
> > > that a Phrase structure tree is the best way to extract triplets. A
> > > dependency graph is a much better option.
> > >
> > > There would be a number of NPs and PPs that are neither the subject nor
> > > the object, and not sure at all whether an adjective can be an object.
> > >
> > > However, if you want to use OpenNLP and the parse tree, maybe you can
> > > consider mapping the tree to FrameNet, thus you will see what kind of
> > > arguments a verb will have and which of these can potentially be the
> > > subject and the object.
> > >
> > > Best,
> > >
> > > Svetoslav
> > > ________________________________________
> > > Från: Carlos Scheidecker <[email protected]>
> > > Skickat: den 26 september 2013 11:37
> > > Till: [email protected]
> > > Ämne: Triplet Extraction with OpenNLP
> > >
> > > Hello all,
> > >
> > > I am interested in performing Triplet Extraction.
> > >
> > > For that, I need to traverse the parse tree.
> > >
> > > I know how to use the ChunkMe, however I am not sure how to use the
> > Parser
> > > so that I can create a tree to traverse it.
> > >
> > > Ideally, I want to obtain the subject, predicate and object.
> > >
> > > To find the subject I need to search in the NP subtree selecting the
> > first
> > > descendent of NP that is a Noun via breadth first search.
> > >
> > > To find the predicate I will search the VP subtree, the deepest verb
> > > descendent on that tree will give the predicate.
> > >
> > > Now for the object(s) they can be in 3 different subtrees. PP, NP and
> > ADJ.
> > > In NP and PP they will be the first noun while on the ADJ we need to
> > locate
> > > the first adjective.
> > >
> > > Therefore, what I need to learn is how to create the parser and the
> main
> > > tree so that I can navigate the subtrees.
> > >
> > > Thanks for the help,
> > >
> > > Carlos.
> > >
> >
>

Reply via email to