On 15/11/2011 20:33, Markus Jelsma wrote:
It's back again! Last try if someone has a pointer for this.
Cheers

After some DB updates, they're gone! Anyone recognizes this phenomenon?

On Tuesday 08 November 2011 11:22:48 Markus Jelsma wrote:
On Tuesday 08 November 2011 11:15:37 Markus Jelsma wrote:
Hi guys,

I've a M/R job selecting only DB_FETCHED and DB_NOTMODIFIED records and
their signatures. I had to add a sanity check on signature to avoid a
NPE. I had the assumption any record with such DB_ status has to have a
signature, right?

Why does roughly 0.0001625% of my records exit without a signature?

Now with correct metrics:
Why does roughly 0.000084% of my records exist without a signature?

This could be somehow related to pages that come from redirects so that when they are fetched they are accounted for under different urls, which in turn may confuse the update code in CrawlDbReducer... Do you notice any pattern to these pages? What's their origin?

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to