Hi Lewis,

We are not using ant and ivy for the build anymore. If you please try
> building with maven commands it will work fine.


Thanks. Sorry I had missed that. Are you planning to get rid of the ANT and
Ivy related stuff? That would make things a bit clearer


> >
> > Finally a very simple question : some backends have indices (SQL of
> course
> > but Cassandra and HBase?) that could be used when querying.  Typically in
> > Nutch we'd want to retrieve all the docs having a specific flag like
> > fetched in order to parse them. Is this implemented? Am sure the answer
> is
> > in the code somewhere but it is good to have a trace on the mailing list
> > for future reference.
> >
>
> Well in gora-cassandra a field such as fetched (fetchedTime?) would be
> defined as a column in the database, therefore it would be possible to
> execute queries normally however I think you are maybe talking about
> some like GORA119? Can you review and confirm?
>
> https://issues.apache.org/jira/browse/GORA-119
>
>
yes I think that covers it. Basically what I want to check is whether we
scan the whole dataset and filter on the fly or use queries on the back end
side to return only what is needed. I think this would make a substantial
difference in performance and would be a perfect illustration of what Nutch
2.x does that 1.x can't

Thanks

Julien

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to