Hi Lewis, We are not using ant and ivy for the build anymore. If you please try > building with maven commands it will work fine.
Thanks. Sorry I had missed that. Are you planning to get rid of the ANT and Ivy related stuff? That would make things a bit clearer > > > > Finally a very simple question : some backends have indices (SQL of > course > > but Cassandra and HBase?) that could be used when querying. Typically in > > Nutch we'd want to retrieve all the docs having a specific flag like > > fetched in order to parse them. Is this implemented? Am sure the answer > is > > in the code somewhere but it is good to have a trace on the mailing list > > for future reference. > > > > Well in gora-cassandra a field such as fetched (fetchedTime?) would be > defined as a column in the database, therefore it would be possible to > execute queries normally however I think you are maybe talking about > some like GORA119? Can you review and confirm? > > https://issues.apache.org/jira/browse/GORA-119 > > yes I think that covers it. Basically what I want to check is whether we scan the whole dataset and filter on the fly or use queries on the back end side to return only what is needed. I think this would make a substantial difference in performance and would be a perfect illustration of what Nutch 2.x does that 1.x can't Thanks Julien -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

