Tulay Muezzinoglu created NUTCH-2440:
----------------------------------------
Summary: DbResource does not accept crawlid
Key: NUTCH-2440
URL: https://issues.apache.org/jira/browse/NUTCH-2440
Project: Nutch
Issue Type: Bug
Components: REST_api
Affects Versions: 2.3, 2.4
Reporter: Tulay Muezzinoglu
Priority: Critical
Fix For: 2.4
DbResource is initiating DbReaders with null crawlids. This blocks querying
correct table/collection if crawlid is set during fetch.
For example in mongodb, by default all data is stored in "webpage" collection.
Let say you set crawlid as "tech" for fetch, then all data gets stored in
"tech_webpage" collection. But during rest call to /db end point, since you
cannot specify crawlid, it will query "webpage" collection.
I am thinking either DBFilter can be changed to read in crawlid, or resource
path can include crawlid. I am open to suggestions and then can make PR.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)