internally to the Parse class, I think, perhaps, the showCodeTree() method is doing similar to what you might want (as a start), it is a recursive method for traversing through the children of the top parse object. If you have the source code look at the Parse object, and the showCodeTree method. I was thinking you could construct a sorted map (TreeMap) with part of speech or chunk as a key sorted by the order it was mentioned, and then a treeset of parts as the value to each key so you would be able to get the first or last from the value/set depending on the position and type of the key. Just a rough thought though Mark G
On Fri, Sep 27, 2013 at 3:09 AM, Carlos Scheidecker <[email protected]>wrote: > This is awesome Mark, thanks! > > This will be quite useful for everybody else as well. > > I ended up doing mine and I went further with the other part of extraction. > > What I found interesting is the time it takes to load the > model en-parser-chunking.bin which is about 36mb. > > So I am not loading everytime but just during object creation. > > Anyone has another better suggestion? > > cheers. > > > On Thu, Sep 26, 2013 at 4:59 PM, Mark G <[email protected]> wrote: > > > Carlos.. I threw this together to show how to get a Parser running. > > Look at what this prints, I think you may be able to iterate through > > topParses[] and traverse the tree. If there is a more efficient way I am > > sure the other OpenNLPers will chime in. > > > > > > public static void main(String[] args) throws InvalidFormatException, > > IOException { > > > > InputStream is = new > > FileInputStream("c:\\temp\\opennlpmodels\\en-parser-chunking.bin"); > > > > ParserModel model = new ParserModel(is); > > is.close(); > > Parser parser = ParserFactory.create(model); > > > > String sentence = "The countries broke off peace talks following the > > Mumbai attacks but have begun discussions again, focusing on increasing > > trade."; > > Parse topParses[] = ParserTool.parseLine(sentence, parser, 1); > > > > Parse p = topParses[0]; > > p.showCodeTree(); > > p.show(); > > p.getParent(); > > p.getChildren(); > > > > > > System.out.println(p.getText()); > > } > > > > It should print all this... > > > > [0] S 2092924121 -> 2092924121 TOP The countries broke off peace talks > > following the Mumbai attacks but have begun discussions again, focusing > on > > increasing trade. > > [0.0] NP 2092766686 -> 2092924121 S The countries > > [0.0.0] DT 2092752996 -> 2092766686 NP The > > [0.0.0.0] TK 2092752996 -> 2092752996 DT The > > [0.0.1] NNS 2092969298 -> 2092766686 NP countries > > [0.0.1.0] TK 2092969298 -> 2092969298 NNS countries > > [0.1] VP 2093633263 -> 2092924121 S broke off peace talks following the > > Mumbai attacks but have begun discussions again, focusing on increasing > > trade. > > [0.1.0] VP 2093545647 -> 2093633263 VP broke off peace talks following > the > > Mumbai attacks > > [0.1.0.0] VBD 2093484042 -> 2093545647 VP broke > > [0.1.0.0.0] TK 2093484042 -> 2093484042 VBD broke > > [0.1.0.1] PRT 2093793436 -> 2093545647 VP off > > [0.1.0.1.0] RP 2093793436 -> 2093793436 PRT off > > [0.1.0.1.0.0] TK 2093793436 -> 2093793436 RP off > > [0.1.0.2] NP 2094012476 -> 2093545647 VP peace talks > > [0.1.0.2.0] NN 2094004262 -> 2094012476 NP peace > > [0.1.0.2.0.0] TK 2094004262 -> 2094004262 NN peace > > [0.1.0.2.1] NNS 2094316394 -> 2094012476 NP talks > > [0.1.0.2.1.0] TK 2094316394 -> 2094316394 NNS talks > > [0.1.0.3] PP 2094660013 -> 2093545647 VP following the Mumbai attacks > > [0.1.0.3.0] VBG 2094634002 -> 2094660013 PP following > > [0.1.0.3.0.0] TK 2094634002 -> 2094634002 VBG following > > [0.1.0.3.1] NP 2095166543 -> 2094660013 PP the Mumbai attacks > > [0.1.0.3.1.0] DT 2095146008 -> 2095166543 NP the > > [0.1.0.3.1.0.0] TK 2095146008 -> 2095146008 DT the > > [0.1.0.3.1.1] NNP 2095358203 -> 2095166543 NP Mumbai > > [0.1.0.3.1.1.0] TK 2095358203 -> 2095358203 NNP Mumbai > > [0.1.0.3.1.2] NNS 2095723726 -> 2095166543 NP attacks > > [0.1.0.3.1.2.0] TK 2095723726 -> 2095723726 NNS attacks > > [0.1.1] CC 2096134426 -> 2093633263 VP but > > [0.1.1.0] TK 2096134426 -> 2096134426 CC but > > [0.1.2] VP 2096419178 -> 2093633263 VP have begun discussions again, > > focusing on increasing trade. > > [0.1.2.0] VBP 2096343883 -> 2096419178 VP have > > [0.1.2.0.0] TK 2096343883 -> 2096343883 VBP have > > [0.1.2.1] VP 2096672443 -> 2096419178 VP begun discussions again, > focusing > > on increasing trade. > > [0.1.2.1.0] VBN 2096605362 -> 2096672443 VP begun > > [0.1.2.1.0.0] TK 2096605362 -> 2096605362 VBN begun > > [0.1.2.1.1] NP 2096925708 -> 2096672443 VP discussions > > [0.1.2.1.1.0] NNS 2096925708 -> 2096925708 NP discussions > > [0.1.2.1.1.0.0] TK 2096925708 -> 2096925708 NNS discussions > > [0.1.2.1.2] PP 2097584197 -> 2096672443 VP again, focusing on increasing > > trade. > > [0.1.2.1.2.0] IN 2097543127 -> 2097584197 PP again, > > [0.1.2.1.2.0.0] TK 2097543127 -> 2097543127 IN again, > > [0.1.2.1.2.1] S 2097938768 -> 2097584197 PP focusing on increasing trade. > > [0.1.2.1.2.1.0] VP 2097938768 -> 2097938768 S focusing on increasing > trade. > > [0.1.2.1.2.1.0.0] VBG 2097910019 -> 2097938768 VP focusing > > [0.1.2.1.2.1.0.0.0] TK 2097910019 -> 2097910019 VBG focusing > > [0.1.2.1.2.1.0.1] PP 2098394645 -> 2097938768 VP on increasing trade. > > [0.1.2.1.2.1.0.1.0] IN 2098370003 -> 2098394645 PP on > > [0.1.2.1.2.1.0.1.0.0] TK 2098370003 -> 2098370003 IN on > > [0.1.2.1.2.1.0.1.1] NP 2098546604 -> 2098394645 PP increasing trade. > > [0.1.2.1.2.1.0.1.1.0] VBG 2098537021 -> 2098546604 NP increasing > > [0.1.2.1.2.1.0.1.1.0.0] TK 2098537021 -> 2098537021 VBG increasing > > [0.1.2.1.2.1.0.1.1.1] NN 2099103787 -> 2098546604 NP trade. > > [0.1.2.1.2.1.0.1.1.1.0] TK 2099103787 -> 2099103787 NN trade. > > (TOP (S (NP (DT The) (NNS countries)) (VP (VP (VBD broke) (PRT (RP off)) > > (NP (NN peace) (NNS talks)) (PP (VBG following) (NP (DT the) (NNP Mumbai) > > (NNS attacks)))) (CC but) (VP (VBP have) (VP (VBN begun) (NP (NNS > > discussions)) (PP (IN again,) (S (VP (VBG focusing) (PP (IN on) (NP (VBG > > increasing) (NN trade.))))))))))) > > The countries broke off peace talks following the Mumbai attacks but have > > begun discussions again, focusing on increasing trade > > > > let me know how it works > > > > happy coding! > > > > Mark G > > > > > > > > On Thu, Sep 26, 2013 at 4:14 PM, Carlos Scheidecker <[email protected] > > >wrote: > > > > > Thanks Svetoslav, > > > > > > Would you have an example on that? > > > > > > cheers, > > > > > > Carlos. > > > > > > > > > On Thu, Sep 26, 2013 at 5:09 AM, Svetoslav Marinov < > > > [email protected]> wrote: > > > > > > > Hi Carlos, > > > > > > > > This is not exactly answer to your question but I am not really > > convinced > > > > that a Phrase structure tree is the best way to extract triplets. A > > > > dependency graph is a much better option. > > > > > > > > There would be a number of NPs and PPs that are neither the subject > nor > > > > the object, and not sure at all whether an adjective can be an > object. > > > > > > > > However, if you want to use OpenNLP and the parse tree, maybe you can > > > > consider mapping the tree to FrameNet, thus you will see what kind of > > > > arguments a verb will have and which of these can potentially be the > > > > subject and the object. > > > > > > > > Best, > > > > > > > > Svetoslav > > > > ________________________________________ > > > > Från: Carlos Scheidecker <[email protected]> > > > > Skickat: den 26 september 2013 11:37 > > > > Till: [email protected] > > > > Ämne: Triplet Extraction with OpenNLP > > > > > > > > Hello all, > > > > > > > > I am interested in performing Triplet Extraction. > > > > > > > > For that, I need to traverse the parse tree. > > > > > > > > I know how to use the ChunkMe, however I am not sure how to use the > > > Parser > > > > so that I can create a tree to traverse it. > > > > > > > > Ideally, I want to obtain the subject, predicate and object. > > > > > > > > To find the subject I need to search in the NP subtree selecting the > > > first > > > > descendent of NP that is a Noun via breadth first search. > > > > > > > > To find the predicate I will search the VP subtree, the deepest verb > > > > descendent on that tree will give the predicate. > > > > > > > > Now for the object(s) they can be in 3 different subtrees. PP, NP and > > > ADJ. > > > > In NP and PP they will be the first noun while on the ADJ we need to > > > locate > > > > the first adjective. > > > > > > > > Therefore, what I need to learn is how to create the parser and the > > main > > > > tree so that I can navigate the subtrees. > > > > > > > > Thanks for the help, > > > > > > > > Carlos. > > > > > > > > > >
