[ http://issues.apache.org/jira/browse/NUTCH-172?page=all ]
Sami Siren closed NUTCH-172.
Segment merger
--
Key: NUTCH-172
URL: http://issues.apache.org/jira/browse/NUTCH-172
Project: Nutch
[ http://issues.apache.org/jira/browse/NUTCH-178?page=all ]
Sami Siren closed NUTCH-178.
in search.jsp must be session creation false
--
Key: NUTCH-178
URL:
[ http://issues.apache.org/jira/browse/NUTCH-184?page=all ]
Sami Siren closed NUTCH-184.
Serbian (sr, Cyrilic) and Serbo-Croatian (sh, Latin) translation
Key: NUTCH-184
[ http://issues.apache.org/jira/browse/NUTCH-160?page=all ]
Sami Siren closed NUTCH-160.
Use standard Java Regex library rather than org.apache.oro.text.regex
-
Key:
[ http://issues.apache.org/jira/browse/NUTCH-177?page=all ]
Sami Siren closed NUTCH-177.
Default installation seems to produce working entity of nutch
-
Key: NUTCH-177
[ http://issues.apache.org/jira/browse/NUTCH-137?page=all ]
Sami Siren closed NUTCH-137.
footer is not displayed in search result page
-
Key: NUTCH-137
URL:
[ http://issues.apache.org/jira/browse/NUTCH-197?page=all ]
Sami Siren closed NUTCH-197.
NullPointerException in TaskRunner if application jar does not have lib
directory
---
[ http://issues.apache.org/jira/browse/NUTCH-221?page=all ]
Sami Siren closed NUTCH-221.
prepare nutch for upcoming lucene 2.0
-
Key: NUTCH-221
URL:
[ http://issues.apache.org/jira/browse/NUTCH-193?page=all ]
Sami Siren closed NUTCH-193.
move NDFS and MapReduce to a separate project
-
Key: NUTCH-193
URL:
[ http://issues.apache.org/jira/browse/NUTCH-201?page=all ]
Sami Siren closed NUTCH-201.
add support for subcollections
--
Key: NUTCH-201
URL: http://issues.apache.org/jira/browse/NUTCH-201
[ http://issues.apache.org/jira/browse/NUTCH-200?page=all ]
Sami Siren closed NUTCH-200.
OpenSearch Servlet ist broken
-
Key: NUTCH-200
URL: http://issues.apache.org/jira/browse/NUTCH-200
[ http://issues.apache.org/jira/browse/NUTCH-211?page=all ]
Sami Siren closed NUTCH-211.
FetchedSegments leave readers open
--
Key: NUTCH-211
URL:
[ http://issues.apache.org/jira/browse/NUTCH-212?page=all ]
Sami Siren closed NUTCH-212.
ant build problem with locale-sr
Key: NUTCH-212
URL: http://issues.apache.org/jira/browse/NUTCH-212
[ http://issues.apache.org/jira/browse/NUTCH-209?page=all ]
Sami Siren closed NUTCH-209.
include nutch jar in mapred jobs
Key: NUTCH-209
URL: http://issues.apache.org/jira/browse/NUTCH-209
[ http://issues.apache.org/jira/browse/NUTCH-257?page=all ]
Sami Siren closed NUTCH-257.
Summary#toString always Entity encodes -- problem for
OpenSearchServlet#description field
[ http://issues.apache.org/jira/browse/NUTCH-280?page=all ]
Sami Siren closed NUTCH-280.
url query causes NullPointerException
-
Key: NUTCH-280
URL:
[ http://issues.apache.org/jira/browse/NUTCH-292?page=all ]
Sami Siren closed NUTCH-292.
OpenSearchServlet: OutOfMemoryError: Java heap space
Key: NUTCH-292
URL:
[ http://issues.apache.org/jira/browse/NUTCH-250?page=all ]
Sami Siren closed NUTCH-250.
Generate to log truncation caused by generate.max.per.host
--
Key: NUTCH-250
[ http://issues.apache.org/jira/browse/NUTCH-298?page=all ]
Sami Siren closed NUTCH-298.
if a 404 for a robots.txt is returned a NPE is thrown
-
Key: NUTCH-298
URL:
[ http://issues.apache.org/jira/browse/NUTCH-303?page=all ]
Sami Siren closed NUTCH-303.
logging improvements
Key: NUTCH-303
URL: http://issues.apache.org/jira/browse/NUTCH-303
Project:
[ http://issues.apache.org/jira/browse/NUTCH-306?page=all ]
Sami Siren closed NUTCH-306.
DistributedSearch.Client liveAddresses concurrency problem
--
Key: NUTCH-306
[ http://issues.apache.org/jira/browse/NUTCH-302?page=all ]
Sami Siren closed NUTCH-302.
java doc of CrawlDb is wrong
Key: NUTCH-302
URL: http://issues.apache.org/jira/browse/NUTCH-302
[ http://issues.apache.org/jira/browse/NUTCH-301?page=all ]
Sami Siren closed NUTCH-301.
CommonGrams loads analysis.common.terms.file for each query
---
Key: NUTCH-301
[ http://issues.apache.org/jira/browse/NUTCH-307?page=all ]
Sami Siren closed NUTCH-307.
wrong configured log4j.properties
-
Key: NUTCH-307
URL: http://issues.apache.org/jira/browse/NUTCH-307
[ http://issues.apache.org/jira/browse/NUTCH-320?page=all ]
Sami Siren closed NUTCH-320.
DmozParser does not output urls to stdout
-
Key: NUTCH-320
URL:
[ http://issues.apache.org/jira/browse/NUTCH-319?page=all ]
Sami Siren closed NUTCH-319.
OPICScoringFilter should use logging API instead of printStackTrace
---
Key: NUTCH-319
[ http://issues.apache.org/jira/browse/NUTCH-328?page=all ]
Sami Siren closed NUTCH-328.
commons-cli-2.0-SNAPSHOT.jar provided with nutch is not compatible with jdk
1.4
---
[ http://issues.apache.org/jira/browse/NUTCH-312?page=all ]
Sami Siren closed NUTCH-312.
Fix for upcoming incompatibility with Hadoop-0.4
Key: NUTCH-312
URL:
[ http://issues.apache.org/jira/browse/NUTCH-327?page=all ]
Sami Siren closed NUTCH-327.
bin/nutch setting of log path problems on cygwin
Key: NUTCH-327
URL:
[ http://issues.apache.org/jira/browse/NUTCH-317?page=all ]
Sami Siren closed NUTCH-317.
Clarify what the queryLanguage argument of Query.parse(...) means
-
Key: NUTCH-317
[ http://issues.apache.org/jira/browse/NUTCH-379?page=all ]
Sami Siren updated NUTCH-379:
-
Fix Version/s: (was: 0.8.1)
(was: 0.8)
cannot fix released versions
ParseUtil does not pass through the content's URL to the
[
http://issues.apache.org/jira/browse/NUTCH-339?page=comments#action_12442195 ]
Sami Siren commented on NUTCH-339:
--
[[ Old comment, sent by email on Sun, 06 Aug 2006 08:06:13 +0300 ]]
The original Fetcher is no longer being polite?
Other
Link to 0.8.x apidocs broken on website
---
Key: NUTCH-375
URL: http://issues.apache.org/jira/browse/NUTCH-375
Project: Nutch
Issue Type: Bug
Components: documentation
Reporter: Sami
[ http://issues.apache.org/jira/browse/NUTCH-375?page=all ]
Sami Siren resolved NUTCH-375.
--
Resolution: Fixed
this was fixed by copying apidocs from 0.8.1 to
/www/lucene.apache.org/nutch/apidocs-0.8.x/
as soon as next rsync occurs it should be fine,
[
http://issues.apache.org/jira/browse/NUTCH-351?page=comments#action_12438013 ]
Sami Siren commented on NUTCH-351:
--
As the plugin name says it by using a protocol-forwardproxy acts as a protocol
plugin and does not need additional protocol
[ http://issues.apache.org/jira/browse/NUTCH-266?page=all ]
Sami Siren closed NUTCH-266.
hadoop bug when doing updatedb
--
Key: NUTCH-266
URL: http://issues.apache.org/jira/browse/NUTCH-266
[ http://issues.apache.org/jira/browse/NUTCH-105?page=all ]
Sami Siren closed NUTCH-105.
Network error during robots.txt fetch causes file to be ignored
---
Key: NUTCH-105
[ http://issues.apache.org/jira/browse/NUTCH-318?page=all ]
Sami Siren closed NUTCH-318.
log4j not proper configured, readdb doesnt give any information
---
Key: NUTCH-318
[ http://issues.apache.org/jira/browse/NUTCH-370?page=all ]
Sami Siren updated NUTCH-370:
-
Summary: Generator looses urls when run with LocalJobRunner (was:
Generator loosed urls when run with LocalJobRunner)
Generator looses urls when run with
Generator loosed urls when run with LocalJobRunner
--
Key: NUTCH-370
URL: http://issues.apache.org/jira/browse/NUTCH-370
Project: Nutch
Issue Type: Bug
Components: generator
[
http://issues.apache.org/jira/browse/NUTCH-368?page=comments#action_1243 ]
Sami Siren commented on NUTCH-368:
--
IMO a place for stuff like this is in hadoop more than nutch and i would like
to see this implemented there.
Mainly because i
[
http://issues.apache.org/jira/browse/NUTCH-365?page=comments#action_12435175 ]
Sami Siren commented on NUTCH-365:
--
looks ok to me,
the ugly (with amp;) regexps could perhaps be put inside ![CDATA[ ]] elements
in generator there's
+ try {
+
[ http://issues.apache.org/jira/browse/NUTCH-105?page=all ]
Sami Siren updated NUTCH-105:
-
Fix Version/s: 0.8.1
0.9.0
looks ok to me. If there is no objections I'll commit this before 0.8.1
Network error during robots.txt fetch causes
[
http://issues.apache.org/jira/browse/NUTCH-361?page=comments#action_12433169 ]
Sami Siren commented on NUTCH-361:
--
The / by 0 was due to bug in testcase. Now the testcase fails about 50% of
time. I also noticed that the number of reduce
[
http://issues.apache.org/jira/browse/NUTCH-208?page=comments#action_12433175 ]
Sami Siren commented on NUTCH-208:
--
This looks like a good addition to Nutch, couple of comments:
-The added comments in HttpResponse should be removed.
-Any
[
http://issues.apache.org/jira/browse/NUTCH-339?page=comments#action_12433185 ]
Sami Siren commented on NUTCH-339:
--
Andrzej,
are you still working with this or should I proceed as I originally planned ;)
Refactor nutch to allow fetcher
[
http://issues.apache.org/jira/browse/NUTCH-273?page=comments#action_12433183 ]
Sami Siren commented on NUTCH-273:
--
+1 for not following redirects immediately - simplify fetcher logic.
I would also like to see a flexible (configurable?)
Remove parse-text from unsupported filetypes in parse-plugins.xml
-
Key: NUTCH-362
URL: http://issues.apache.org/jira/browse/NUTCH-362
Project: Nutch
Issue Type: Bug
[
http://issues.apache.org/jira/browse/NUTCH-361?page=comments#action_12432861 ]
Sami Siren commented on NUTCH-361:
--
nightly buils are broken because of this problem, I scratched my head for a
long time because my local shource was working
[
http://issues.apache.org/jira/browse/NUTCH-361?page=comments#action_12432864 ]
Sami Siren commented on NUTCH-361:
--
oops, pasted wron property
property
namemapred.reduce.tasks/name
value1/value
description
define mapred.reduce
[
http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12432871 ]
Sami Siren commented on NUTCH-266:
--
what version of nutch are you running?
hadoop bug when doing updatedb
--
Key:
Switch nutch to use java 5 source format
Key: NUTCH-360
URL: http://issues.apache.org/jira/browse/NUTCH-360
Project: Nutch
Issue Type: Task
Affects Versions: 0.9.0
Reporter: Sami
[ http://issues.apache.org/jira/browse/NUTCH-360?page=all ]
Sami Siren resolved NUTCH-360.
--
Resolution: Fixed
done
Switch nutch to use java 5 source format
Key: NUTCH-360
URL:
[
http://issues.apache.org/jira/browse/NUTCH-341?page=comments#action_12429029 ]
Sami Siren commented on NUTCH-341:
--
+1 for v2
IndexMerger now deletes entire workingdir after completing
[ http://issues.apache.org/jira/browse/NUTCH-347?page=all ]
Sami Siren resolved NUTCH-347.
--
Fix Version/s: 0.9.0
Resolution: Fixed
Assignee: Sami Siren
committed
Build: plugins' Jars not found
--
[ http://issues.apache.org/jira/browse/NUTCH-338?page=all ]
Sami Siren resolved NUTCH-338.
--
Resolution: Fixed
This is now committed, thank you.
The patch was broken, hopefully I got it right.
Remove the text parser as an option for parsing PDF files in
[
http://issues.apache.org/jira/browse/NUTCH-338?page=comments#action_12429044 ]
Sami Siren commented on NUTCH-338:
--
yeah, svn diff from commandline is the winner.
Remove the text parser as an option for parsing PDF files in parse-plugins.xml
[ http://issues.apache.org/jira/browse/NUTCH-338?page=all ]
Sami Siren updated NUTCH-338:
-
Fix Version/s: 0.8.1
Remove the text parser as an option for parsing PDF files in parse-plugins.xml
Protocol forward proxy
--
Key: NUTCH-351
URL: http://issues.apache.org/jira/browse/NUTCH-351
Project: Nutch
Issue Type: New Feature
Components: fetcher
Affects Versions: 0.8, 0.8.1, 0.9.0
[
http://issues.apache.org/jira/browse/NUTCH-349?page=comments#action_12428399 ]
Sami Siren commented on NUTCH-349:
--
I anything at all should be done then I'd go for #2. There was also a total
incombatibility from 0.7 to 0.8 and I didn't see
[
http://issues.apache.org/jira/browse/NUTCH-347?page=comments#action_12427729 ]
Sami Siren commented on NUTCH-347:
--
Those warnings are ok - there's not any harm happening. There are some plug-ins
(lib-log4j for example) that don't generate
[ http://issues.apache.org/jira/browse/NUTCH-347?page=all ]
Sami Siren updated NUTCH-347:
-
Attachment: nutch_build_plugins_patch.txt
Build: plugins' Jars not found
--
Key: NUTCH-347
URL:
[ http://issues.apache.org/jira/browse/NUTCH-344?page=all ]
Sami Siren resolved NUTCH-344.
--
Fix Version/s: 0.8.1
0.9.0
Resolution: Fixed
I just committed this to 0.8 branch and trunk, thanks Greg!
Fetcher threads blocked on
[ http://issues.apache.org/jira/browse/NUTCH-266?page=all ]
Sami Siren resolved NUTCH-266.
--
Resolution: Fixed
I just updated hadoop versions, trunk contains 0.5.0, 0.8-branch contains
patched 0.4.0
hadoop bug when doing updatedb
[ http://issues.apache.org/jira/browse/NUTCH-340?page=all ]
Sami Siren resolved NUTCH-340.
--
Fix Version/s: (was: 0.8.1)
Resolution: Fixed
I just committed this to svn trunk and updated the website, thanks!
Bug(s) in 0.8 tutorial
[
http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12425753 ]
Sami Siren commented on NUTCH-266:
--
I am planning to build a patched fersion of hadoop 0.4.0 that includes a fix
for this problem.
If there are no objections I
Refactor nutch to allow fetcher improvements
-
Key: NUTCH-339
URL: http://issues.apache.org/jira/browse/NUTCH-339
Project: Nutch
Issue Type: Task
Components: fetcher
Affects
[
http://issues.apache.org/jira/browse/NUTCH-339?page=comments#action_12425782 ]
Sami Siren commented on NUTCH-339:
--
I am not sure to what you refer to by this 3-4 sec but yes I agree threre are
more aspects to optimize in fetcher, what I was
[ http://issues.apache.org/jira/browse/NUTCH-266?page=all ]
Sami Siren updated NUTCH-266:
-
Fix Version/s: 0.8.1
0.9.0
hadoop bug when doing updatedb
--
Key: NUTCH-266
URL:
[ http://issues.apache.org/jira/browse/NUTCH-339?page=all ]
Sami Siren updated NUTCH-339:
-
Fix Version/s: 0.9.0
Affects Version/s: 0.8
(was: 0.9.0)
Refactor nutch to allow fetcher improvements
[
http://issues.apache.org/jira/browse/NUTCH-340?page=comments#action_12425820 ]
Sami Siren commented on NUTCH-340:
--
thanks for the effort, I however cannot apply your patch.
Can you please check out
[ http://issues.apache.org/jira/browse/NUTCH-318?page=all ]
Sami Siren resolved NUTCH-318.
--
Fix Version/s: 0.8.1
Resolution: Fixed
Assignee: Sami Siren
marking this as resolved because it is now working ok in single node config.
log4j not
[
http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12424930 ]
Sami Siren commented on NUTCH-266:
--
just adding a remainder:
there are two options to get this fixed, use patched version of hadoop-0.4.0 or
wait until
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ]
Sami Siren updated NUTCH-258:
-
Fix Version/s: 0.9
(was: 0.8)
Once Nutch logs a SEVERE log item, Nutch fails forevermore
[
http://issues.apache.org/jira/browse/NUTCH-318?page=comments#action_12423531 ]
Sami Siren commented on NUTCH-318:
--
Perhaps this is happening in distributed setup? in 1 machine setup output is
done to log file see NUTCH-315
log4j not proper
[
http://issues.apache.org/jira/browse/NUTCH-318?page=comments#action_12423546 ]
Sami Siren commented on NUTCH-318:
--
I agree :) so the next thing to do is change readdb -stats to print to stdout,
i'll go ahead and do that. Are there any other
[ http://issues.apache.org/jira/browse/NUTCH-315?page=all ]
Sami Siren resolved NUTCH-315.
--
Resolution: Duplicate
duplicate of NUTCH-318
CrawlDbReader usage text - implementation mismatch
--
[
http://issues.apache.org/jira/browse/NUTCH-318?page=comments#action_12423557 ]
Sami Siren commented on NUTCH-318:
--
could this be solved by just adding folowing line into conf/log4j.properties?
[
http://issues.apache.org/jira/browse/NUTCH-318?page=comments#action_12423579 ]
Sami Siren commented on NUTCH-318:
--
i just committed some changes to log4j configuration for some command line
tools to trunk, is this satisfactory solution to
[ http://issues.apache.org/jira/browse/NUTCH-249?page=all ]
Sami Siren updated NUTCH-249:
-
Fix Version/s: 0.9-dev
(was: 0.8-dev)
black- white list url filtering
---
Key: NUTCH-249
[ http://issues.apache.org/jira/browse/NUTCH-86?page=all ]
Sami Siren updated NUTCH-86:
Fix Version/s: 0.9-dev
(was: 0.8-dev)
LanguageIdentifier API enhancements
---
Key: NUTCH-86
[ http://issues.apache.org/jira/browse/NUTCH-246?page=all ]
Sami Siren updated NUTCH-246:
-
Fix Version/s: 0.9-dev
(was: 0.8-dev)
segment size is never as big as topN or crawlDB size in a distributed
deployement
[ http://issues.apache.org/jira/browse/NUTCH-251?page=all ]
Sami Siren updated NUTCH-251:
-
Fix Version/s: 0.9-dev
(was: 0.8-dev)
Administration GUI
--
Key: NUTCH-251
URL:
[ http://issues.apache.org/jira/browse/NUTCH-318?page=all ]
Sami Siren updated NUTCH-318:
-
Fix Version/s: 0.9-dev
(was: 0.8-dev)
log4j not proper configured, readdb doesnt give any information
[ http://issues.apache.org/jira/browse/NUTCH-322?page=all ]
Sami Siren updated NUTCH-322:
-
Fix Version/s: 0.9-dev
(was: 0.8-dev)
Fetcher discards ProtocolStatus, doesn't store redirected pages
[ http://issues.apache.org/jira/browse/NUTCH-262?page=all ]
Sami Siren updated NUTCH-262:
-
Fix Version/s: 0.9-dev
(was: 0.8-dev)
Summary excerpts and highlights problems
[ http://issues.apache.org/jira/browse/NUTCH-233?page=all ]
Sami Siren updated NUTCH-233:
-
Fix Version/s: 0.9-dev
(was: 0.8-dev)
wrong regular expression hang reduce process for ever
[ http://issues.apache.org/jira/browse/NUTCH-247?page=all ]
Sami Siren updated NUTCH-247:
-
Fix Version/s: 0.9-dev
(was: 0.8-dev)
robot parser to restrict.
-
Key: NUTCH-247
[ http://issues.apache.org/jira/browse/NUTCH-325?page=all ]
Sami Siren updated NUTCH-325:
-
Fix Version/s: 0.9-dev
(was: 0.8-dev)
UrlFilters.java throws NPE in case urlfilter.order contains Filters that are
not in plugin.includes
[
http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12422929 ]
Sami Siren commented on NUTCH-266:
--
I finally found the time to setup an environment with cygwin and try this out.
I can confirm that the hadoop.jar version
bin/nutch setting of log path problems on cygwin
Key: NUTCH-327
URL: http://issues.apache.org/jira/browse/NUTCH-327
Project: Nutch
Issue Type: Bug
Affects Versions: 0.8-dev
[ http://issues.apache.org/jira/browse/NUTCH-327?page=all ]
Sami Siren resolved NUTCH-327.
--
Resolution: Fixed
bin/nutch setting of log path problems on cygwin
Key: NUTCH-327
commons-cli-2.0-SNAPSHOT.jar provided with nutch is not compatible with jdk 1.4
---
Key: NUTCH-328
URL: http://issues.apache.org/jira/browse/NUTCH-328
Project: Nutch
[ http://issues.apache.org/jira/browse/NUTCH-328?page=all ]
Sami Siren resolved NUTCH-328.
--
Resolution: Fixed
updated library
commons-cli-2.0-SNAPSHOT.jar provided with nutch is not compatible with jdk
1.4
[
http://issues.apache.org/jira/browse/NUTCH-293?page=comments#action_12421930 ]
Sami Siren commented on NUTCH-293:
--
perhaps instead of
delay = crawlDelay 0 ? crawlDelay : serverDelay;
we could do
delay=Math.max(crawlDelay, serverDelay);
DmozParser does not output urls to stdout
-
Key: NUTCH-320
URL: http://issues.apache.org/jira/browse/NUTCH-320
Project: Nutch
Issue Type: Bug
Affects Versions: 0.8-dev
Reporter: Sami
[ http://issues.apache.org/jira/browse/NUTCH-320?page=all ]
Sami Siren resolved NUTCH-320.
--
Resolution: Fixed
DmozParser does not output urls to stdout
-
Key: NUTCH-320
URL:
[ http://issues.apache.org/jira/browse/NUTCH-172?page=all ]
Sami Siren resolved NUTCH-172:
--
Fix Version: 0.8-dev
Resolution: Fixed
Assign To: Andrzej Bialecki
this has allready been implemented by ab
mergesegs
Segment merger
[ http://issues.apache.org/jira/browse/NUTCH-306?page=all ]
Sami Siren resolved NUTCH-306:
--
Fix Version: 0.8-dev
Resolution: Fixed
just committed this, thanks Grant!
DistributedSearch.Client liveAddresses concurrency problem
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ]
Sami Siren reassigned NUTCH-110:
Assign To: Sami Siren
OpenSearchServlet outputs illegal xml characters
Key: NUTCH-110
201 - 300 of 329 matches
Mail list logo