Another important question is whether this is something that is Mahout-ish.
Mahout is a project that supports scalable data mining. That currently includes a mature recommendation framework, less mature clustering and classification tools and a smattering of other tools. What you are proposing sounds a bit more like an application made up of different tools, possibly some from Mahout, and some from other sources. How do you see this? On Wed, Mar 23, 2011 at 9:37 AM, Ted Dunning <[email protected]> wrote: > Let's take this back to the mailing list so all can see. > > If you are familiar with the stanford parser, then this seems like a > feasible project for you to accomplish. I would expect that very similar > results could be achieved using simple word or phrase counts, possibly with > the addition of a chunker. My guess is that the parser would add very > little. > > Stefan Henß did some interesting and very simple work, for instance, for > automated FAQ generation that avoided parsing: > > > http://mail-archives.apache.org/mod_mbox/mahout-user/201102.mbox/%[email protected]%3E > > On Wed, Mar 23, 2011 at 3:24 AM, Harsh <[email protected]> wrote: > >> I want to build over the Stanford parser (the one I am familiar with) and >> want to create a dependency graph for the sentences. The most occurring >> words in any paragraph generally depicts its theme. With the help of the >> dependency developed and word count, I want to guess the theme of the >> paragraph. >> >>
