This is awesome Mark, thanks! This will be quite useful for everybody else as well.
I ended up doing mine and I went further with the other part of extraction. What I found interesting is the time it takes to load the model en-parser-chunking.bin which is about 36mb. So I am not loading everytime but just during object creation. Anyone has another better suggestion? cheers. On Thu, Sep 26, 2013 at 4:59 PM, Mark G <[email protected]> wrote: > Carlos.. I threw this together to show how to get a Parser running. > Look at what this prints, I think you may be able to iterate through > topParses[] and traverse the tree. If there is a more efficient way I am > sure the other OpenNLPers will chime in. > > > public static void main(String[] args) throws InvalidFormatException, > IOException { > > InputStream is = new > FileInputStream("c:\\temp\\opennlpmodels\\en-parser-chunking.bin"); > > ParserModel model = new ParserModel(is); > is.close(); > Parser parser = ParserFactory.create(model); > > String sentence = "The countries broke off peace talks following the > Mumbai attacks but have begun discussions again, focusing on increasing > trade."; > Parse topParses[] = ParserTool.parseLine(sentence, parser, 1); > > Parse p = topParses[0]; > p.showCodeTree(); > p.show(); > p.getParent(); > p.getChildren(); > > > System.out.println(p.getText()); > } > > It should print all this... > > [0] S 2092924121 -> 2092924121 TOP The countries broke off peace talks > following the Mumbai attacks but have begun discussions again, focusing on > increasing trade. > [0.0] NP 2092766686 -> 2092924121 S The countries > [0.0.0] DT 2092752996 -> 2092766686 NP The > [0.0.0.0] TK 2092752996 -> 2092752996 DT The > [0.0.1] NNS 2092969298 -> 2092766686 NP countries > [0.0.1.0] TK 2092969298 -> 2092969298 NNS countries > [0.1] VP 2093633263 -> 2092924121 S broke off peace talks following the > Mumbai attacks but have begun discussions again, focusing on increasing > trade. > [0.1.0] VP 2093545647 -> 2093633263 VP broke off peace talks following the > Mumbai attacks > [0.1.0.0] VBD 2093484042 -> 2093545647 VP broke > [0.1.0.0.0] TK 2093484042 -> 2093484042 VBD broke > [0.1.0.1] PRT 2093793436 -> 2093545647 VP off > [0.1.0.1.0] RP 2093793436 -> 2093793436 PRT off > [0.1.0.1.0.0] TK 2093793436 -> 2093793436 RP off > [0.1.0.2] NP 2094012476 -> 2093545647 VP peace talks > [0.1.0.2.0] NN 2094004262 -> 2094012476 NP peace > [0.1.0.2.0.0] TK 2094004262 -> 2094004262 NN peace > [0.1.0.2.1] NNS 2094316394 -> 2094012476 NP talks > [0.1.0.2.1.0] TK 2094316394 -> 2094316394 NNS talks > [0.1.0.3] PP 2094660013 -> 2093545647 VP following the Mumbai attacks > [0.1.0.3.0] VBG 2094634002 -> 2094660013 PP following > [0.1.0.3.0.0] TK 2094634002 -> 2094634002 VBG following > [0.1.0.3.1] NP 2095166543 -> 2094660013 PP the Mumbai attacks > [0.1.0.3.1.0] DT 2095146008 -> 2095166543 NP the > [0.1.0.3.1.0.0] TK 2095146008 -> 2095146008 DT the > [0.1.0.3.1.1] NNP 2095358203 -> 2095166543 NP Mumbai > [0.1.0.3.1.1.0] TK 2095358203 -> 2095358203 NNP Mumbai > [0.1.0.3.1.2] NNS 2095723726 -> 2095166543 NP attacks > [0.1.0.3.1.2.0] TK 2095723726 -> 2095723726 NNS attacks > [0.1.1] CC 2096134426 -> 2093633263 VP but > [0.1.1.0] TK 2096134426 -> 2096134426 CC but > [0.1.2] VP 2096419178 -> 2093633263 VP have begun discussions again, > focusing on increasing trade. > [0.1.2.0] VBP 2096343883 -> 2096419178 VP have > [0.1.2.0.0] TK 2096343883 -> 2096343883 VBP have > [0.1.2.1] VP 2096672443 -> 2096419178 VP begun discussions again, focusing > on increasing trade. > [0.1.2.1.0] VBN 2096605362 -> 2096672443 VP begun > [0.1.2.1.0.0] TK 2096605362 -> 2096605362 VBN begun > [0.1.2.1.1] NP 2096925708 -> 2096672443 VP discussions > [0.1.2.1.1.0] NNS 2096925708 -> 2096925708 NP discussions > [0.1.2.1.1.0.0] TK 2096925708 -> 2096925708 NNS discussions > [0.1.2.1.2] PP 2097584197 -> 2096672443 VP again, focusing on increasing > trade. > [0.1.2.1.2.0] IN 2097543127 -> 2097584197 PP again, > [0.1.2.1.2.0.0] TK 2097543127 -> 2097543127 IN again, > [0.1.2.1.2.1] S 2097938768 -> 2097584197 PP focusing on increasing trade. > [0.1.2.1.2.1.0] VP 2097938768 -> 2097938768 S focusing on increasing trade. > [0.1.2.1.2.1.0.0] VBG 2097910019 -> 2097938768 VP focusing > [0.1.2.1.2.1.0.0.0] TK 2097910019 -> 2097910019 VBG focusing > [0.1.2.1.2.1.0.1] PP 2098394645 -> 2097938768 VP on increasing trade. > [0.1.2.1.2.1.0.1.0] IN 2098370003 -> 2098394645 PP on > [0.1.2.1.2.1.0.1.0.0] TK 2098370003 -> 2098370003 IN on > [0.1.2.1.2.1.0.1.1] NP 2098546604 -> 2098394645 PP increasing trade. > [0.1.2.1.2.1.0.1.1.0] VBG 2098537021 -> 2098546604 NP increasing > [0.1.2.1.2.1.0.1.1.0.0] TK 2098537021 -> 2098537021 VBG increasing > [0.1.2.1.2.1.0.1.1.1] NN 2099103787 -> 2098546604 NP trade. > [0.1.2.1.2.1.0.1.1.1.0] TK 2099103787 -> 2099103787 NN trade. > (TOP (S (NP (DT The) (NNS countries)) (VP (VP (VBD broke) (PRT (RP off)) > (NP (NN peace) (NNS talks)) (PP (VBG following) (NP (DT the) (NNP Mumbai) > (NNS attacks)))) (CC but) (VP (VBP have) (VP (VBN begun) (NP (NNS > discussions)) (PP (IN again,) (S (VP (VBG focusing) (PP (IN on) (NP (VBG > increasing) (NN trade.))))))))))) > The countries broke off peace talks following the Mumbai attacks but have > begun discussions again, focusing on increasing trade > > let me know how it works > > happy coding! > > Mark G > > > > On Thu, Sep 26, 2013 at 4:14 PM, Carlos Scheidecker <[email protected] > >wrote: > > > Thanks Svetoslav, > > > > Would you have an example on that? > > > > cheers, > > > > Carlos. > > > > > > On Thu, Sep 26, 2013 at 5:09 AM, Svetoslav Marinov < > > [email protected]> wrote: > > > > > Hi Carlos, > > > > > > This is not exactly answer to your question but I am not really > convinced > > > that a Phrase structure tree is the best way to extract triplets. A > > > dependency graph is a much better option. > > > > > > There would be a number of NPs and PPs that are neither the subject nor > > > the object, and not sure at all whether an adjective can be an object. > > > > > > However, if you want to use OpenNLP and the parse tree, maybe you can > > > consider mapping the tree to FrameNet, thus you will see what kind of > > > arguments a verb will have and which of these can potentially be the > > > subject and the object. > > > > > > Best, > > > > > > Svetoslav > > > ________________________________________ > > > Från: Carlos Scheidecker <[email protected]> > > > Skickat: den 26 september 2013 11:37 > > > Till: [email protected] > > > Ämne: Triplet Extraction with OpenNLP > > > > > > Hello all, > > > > > > I am interested in performing Triplet Extraction. > > > > > > For that, I need to traverse the parse tree. > > > > > > I know how to use the ChunkMe, however I am not sure how to use the > > Parser > > > so that I can create a tree to traverse it. > > > > > > Ideally, I want to obtain the subject, predicate and object. > > > > > > To find the subject I need to search in the NP subtree selecting the > > first > > > descendent of NP that is a Noun via breadth first search. > > > > > > To find the predicate I will search the VP subtree, the deepest verb > > > descendent on that tree will give the predicate. > > > > > > Now for the object(s) they can be in 3 different subtrees. PP, NP and > > ADJ. > > > In NP and PP they will be the first noun while on the ADJ we need to > > locate > > > the first adjective. > > > > > > Therefore, what I need to learn is how to create the parser and the > main > > > tree so that I can navigate the subtrees. > > > > > > Thanks for the help, > > > > > > Carlos. > > > > > >
