Let's take this back to the mailing list so all can see. If you are familiar with the stanford parser, then this seems like a feasible project for you to accomplish. I would expect that very similar results could be achieved using simple word or phrase counts, possibly with the addition of a chunker. My guess is that the parser would add very little.
Stefan Henß did some interesting and very simple work, for instance, for automated FAQ generation that avoided parsing: http://mail-archives.apache.org/mod_mbox/mahout-user/201102.mbox/%[email protected]%3E On Wed, Mar 23, 2011 at 3:24 AM, Harsh <[email protected]> wrote: > I want to build over the Stanford parser (the one I am familiar with) and > want to create a dependency graph for the sentences. The most occurring > words in any paragraph generally depicts its theme. With the help of the > dependency developed and word count, I want to guess the theme of the > paragraph. > >
