[
https://issues.apache.org/jira/browse/NUTCH-565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533787
]
Sami Siren commented on NUTCH-565:
--
bq. Both jars are LGPL.
I think that prohibits direct inclusion then. Take a
[
https://issues.apache.org/jira/browse/NUTCH-565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533458
]
Sami Siren commented on NUTCH-565:
--
What are the licenses for those jars?
Arc File to Nutch Segments Converter
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508820
]
Sami Siren commented on NUTCH-392:
--
But why is parse_text_block's size so close to parse_text
data of parse_text
[
https://issues.apache.org/jira/browse/NUTCH-499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508449
]
Sami Siren commented on NUTCH-499:
--
+1, seems good to me
Refactor LinkDb and LinkDbMerger to reuse code
[
https://issues.apache.org/jira/browse/NUTCH-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508508
]
Sami Siren commented on NUTCH-498:
--
+1
Use Combiner in LinkDb to increase speed of linkdb generation
[
https://issues.apache.org/jira/browse/NUTCH-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508222
]
Sami Siren commented on NUTCH-434:
--
You missed one ObjectWritable in Indexer (the one that hit my head too hard
[
https://issues.apache.org/jira/browse/NUTCH-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508239
]
Sami Siren commented on NUTCH-434:
--
Now there is a good chance that you knew all this :). If your point was that
[
https://issues.apache.org/jira/browse/NUTCH-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren updated NUTCH-496:
-
Attachment: nutch-496.txt
This patch changes LanguageIdentifier to have NGramProfile per thread instead
[
https://issues.apache.org/jira/browse/NUTCH-496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501266
]
Sami Siren commented on NUTCH-496:
--
I believe the problem is even more severe. Now several threads share the
[
https://issues.apache.org/jira/browse/NUTCH-161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren updated NUTCH-161:
-
Fix Version/s: 1.0.0
Assignee: Sami Siren
Summary: Change Plain text parser to use
[
https://issues.apache.org/jira/browse/NUTCH-482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren resolved NUTCH-482.
--
Resolution: Fixed
Fix Version/s: 1.0.0
Remove redundant plugin lib-log4j
[
https://issues.apache.org/jira/browse/NUTCH-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren resolved NUTCH-483.
--
Resolution: Fixed
Fix Version/s: 1.0.0
Assignee: Sami Siren
remove redundant
[
https://issues.apache.org/jira/browse/NUTCH-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren resolved NUTCH-484.
--
Resolution: Fixed
committed and updated site, thanks Gal
Nutch Nightly API link is broken in site
Remove redundant plugin lib-log4j
-
Key: NUTCH-482
URL: https://issues.apache.org/jira/browse/NUTCH-482
Project: Nutch
Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Sami Siren
remove redundant commons-logging jar from ontology plugin
-
Key: NUTCH-483
URL: https://issues.apache.org/jira/browse/NUTCH-483
Project: Nutch
Issue Type: Bug
Affects Versions:
[
https://issues.apache.org/jira/browse/NUTCH-456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren resolved NUTCH-456.
--
Resolution: Fixed
committed with minor modifications (used StringBuilder instead of StringBuffer,
[
https://issues.apache.org/jira/browse/NUTCH-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren resolved NUTCH-446.
--
Resolution: Fixed
I just committed this, keep the patches coming Doğacan!
RobotRulesParser should
[
https://issues.apache.org/jira/browse/NUTCH-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren updated NUTCH-469:
-
Attachment: NUTCH-469-2007-05-09.txt.gz
tnahks for putting this together, I briefly checked through the
[
https://issues.apache.org/jira/browse/NUTCH-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren updated NUTCH-469:
-
Fix Version/s: (was: 0.7.3)
1.0.0
changes to geoPosition plugin to make it work
[
https://issues.apache.org/jira/browse/NUTCH-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494531
]
Sami Siren commented on NUTCH-477:
--
I don't feel strongly about this but could enums be used instead of static
[
https://issues.apache.org/jira/browse/NUTCH-472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494534
]
Sami Siren commented on NUTCH-472:
--
have a patch?
NullPointerException in ZipTextExtractor if no MIME type for
[
https://issues.apache.org/jira/browse/NUTCH-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494537
]
Sami Siren commented on NUTCH-476:
--
md5 sum (or any other configurable digest) is already calculated in fetcher
or
[
https://issues.apache.org/jira/browse/NUTCH-446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492850
]
Sami Siren commented on NUTCH-446:
--
+1
RobotRulesParser should ignore Crawl-delay values of other bots in
[
https://issues.apache.org/jira/browse/NUTCH-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491305
]
Sami Siren commented on NUTCH-471:
--
Isn't the DCL declared to be broken?
We could perhaps instead instantiate
[
https://issues.apache.org/jira/browse/NUTCH-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren resolved NUTCH-473.
--
Resolution: Duplicate
duplicate of NUTCH-456
ExcelExtractor performance bad due to String
[
https://issues.apache.org/jira/browse/NUTCH-432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren resolved NUTCH-432.
--
Resolution: Fixed
Fix Version/s: 0.9.0
The problem above has been fixed by ab.
JAVA_PLATFORM
[
https://issues.apache.org/jira/browse/NUTCH-432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren reopened NUTCH-432:
--
After this got applied there's this error printed on console when run on FC5:
bin/nutch: line 152:
[
https://issues.apache.org/jira/browse/NUTCH-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12479419
]
Sami Siren commented on NUTCH-457:
--
+1
Create top level dist directory and checkin KEYS file to subversion be
[
https://issues.apache.org/jira/browse/NUTCH-400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren resolved NUTCH-400.
--
Resolution: Fixed
Fix Version/s: (was: 0.8.2)
I think this is pretty much done.
Update
[
https://issues.apache.org/jira/browse/NUTCH-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474257
]
Sami Siren commented on NUTCH-247:
--
Setting even a bogus agent name is an insignificant effort compared to the
[
https://issues.apache.org/jira/browse/NUTCH-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473990
]
Sami Siren commented on NUTCH-247:
--
Agent name has actually only relevance in http. IMO not setting agent name
[
https://issues.apache.org/jira/browse/NUTCH-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467912
]
Sami Siren commented on NUTCH-434:
--
It's only half way if we get the Configuration into our subclass, there's no
[
https://issues.apache.org/jira/browse/NUTCH-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467916
]
Sami Siren commented on NUTCH-258:
--
I haven't noticed this being a problem for me, so no objections from here.
[
https://issues.apache.org/jira/browse/NUTCH-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467927
]
Sami Siren commented on NUTCH-434:
--
I can see the light, overriding readFields is sufficient.
Replace usage of
[
https://issues.apache.org/jira/browse/NUTCH-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467491
]
Sami Siren commented on NUTCH-433:
--
ok, now it is committed, sorry.
java.io.EOFException in newer nightlies in
[
https://issues.apache.org/jira/browse/NUTCH-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467101
]
Sami Siren commented on NUTCH-433:
--
I am working on this and will probably submit a patch today.
[
https://issues.apache.org/jira/browse/NUTCH-433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren reassigned NUTCH-433:
Assignee: Sami Siren
java.io.EOFException in newer nightlies in mergesegs or indexing from
[
https://issues.apache.org/jira/browse/NUTCH-433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren resolved NUTCH-433.
--
Resolution: Fixed
Fix Version/s: 0.9.0
I just committed a fix for this, however at least I am
Replace usage of ObjectWritable with something based on GenericWritable
---
Key: NUTCH-434
URL: https://issues.apache.org/jira/browse/NUTCH-434
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465493
]
Sami Siren commented on NUTCH-61:
-
Havent looked the patch (tm)
How would one manage segments after something linke
[
https://issues.apache.org/jira/browse/NUTCH-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465540
]
Sami Siren commented on NUTCH-61:
-
ok, so in my usual use case where there are far more urls than I can fetch this
[
https://issues.apache.org/jira/browse/NUTCH-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren resolved NUTCH-430.
--
Resolution: Fixed
Fix Version/s: 0.9.0
committed in revision 495732 with additional whitespace
[
https://issues.apache.org/jira/browse/NUTCH-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren updated NUTCH-430:
-
Attachment: NUTCH-430.patch
integer overflow in HashComparator.compare
[
https://issues.apache.org/jira/browse/NUTCH-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464347
]
Sami Siren commented on NUTCH-422:
--
Is there a reason for the two takarta-regexp-jars (v 1.2 and 1.3) in source
[
https://issues.apache.org/jira/browse/NUTCH-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464351
]
Sami Siren commented on NUTCH-422:
--
couple of more points:
-source files use tabs for indentation
-headers of files
[
https://issues.apache.org/jira/browse/NUTCH-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren resolved NUTCH-428.
--
Resolution: Fixed
Fix Version/s: 0.9.0
Most propably you dont have agent name configured in
[
https://issues.apache.org/jira/browse/NUTCH-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463059
]
Sami Siren commented on NUTCH-420:
--
The feather 'Licensed for inclusion in ASF works' is missing from 2nd patch.
[
https://issues.apache.org/jira/browse/NUTCH-325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren resolved NUTCH-325.
--
Resolution: Fixed
just committed this with additional junit testcase. Thanks Stefan!
UrlFilters.java
[
https://issues.apache.org/jira/browse/NUTCH-421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren reassigned NUTCH-421:
Assignee: Sami Siren
Allow predeterminate running order of index filters
[
https://issues.apache.org/jira/browse/NUTCH-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren reassigned NUTCH-422:
Assignee: Sami Siren
index-extra plugin creates additional fields in the index, based on
[
https://issues.apache.org/jira/browse/NUTCH-421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sami Siren resolved NUTCH-421.
--
Resolution: Fixed
Fix Version/s: 0.9.0
Thanks Alan,
I just committed this with additionali
[
http://issues.apache.org/jira/browse/NUTCH-418?page=comments#action_12460282 ]
Sami Siren commented on NUTCH-418:
--
We should perhaps include the rest of changes made in NUTCH-362.
Fixes parsing of XHTML (e.g. title)
[
http://issues.apache.org/jira/browse/NUTCH-415?page=comments#action_12458814 ]
Sami Siren commented on NUTCH-415:
--
Please also consider the performance implications. If this marking will add
signifigant performance overhead then it would be
[
http://issues.apache.org/jira/browse/NUTCH-248?page=comments#action_12457437 ]
Sami Siren commented on NUTCH-248:
--
Seems like the latest java has build in support
http://java.sun.com/javase/6/docs/api/java/net/IDN.html
add support for
[
http://issues.apache.org/jira/browse/NUTCH-339?page=comments#action_12453975 ]
Sami Siren commented on NUTCH-339:
--
perhaps thath exception is just a consequence of something other like this:
2006-11-27 07:35:09,434 INFO fetcher.Fetcher2 -
[
http://issues.apache.org/jira/browse/NUTCH-339?page=comments#action_12454045 ]
Sami Siren commented on NUTCH-339:
--
I am running with 300 thread, and in parsing mode
thread dump shows:
191 threads waiting on condition
at
[
http://issues.apache.org/jira/browse/NUTCH-339?page=comments#action_12452522 ]
Sami Siren commented on NUTCH-339:
--
patch applies ok, but there's this error when I try to compile:
compile:
[echo] Compiling plugin: lib-http
[javac]
[
http://issues.apache.org/jira/browse/NUTCH-251?page=comments#action_12452321 ]
Sami Siren commented on NUTCH-251:
--
Are you thinking of something like UI extension point like in contrib/web2 ?
not necessarily, that was also a quick hack I
[ http://issues.apache.org/jira/browse/NUTCH-362?page=all ]
Sami Siren resolved NUTCH-362.
--
Resolution: Fixed
Remove parse-text from unsupported filetypes in parse-plugins.xml
-
[
http://issues.apache.org/jira/browse/NUTCH-251?page=comments#action_12451527 ]
Sami Siren commented on NUTCH-251:
--
I am a strong supporter of XML. Can we not re-think about this like SOLR-58 or
plain/jsp like the way hadoop does it?
I
Fix LinkDB Usage - implementation mismatch
--
Key: NUTCH-404
URL: http://issues.apache.org/jira/browse/NUTCH-404
Project: Nutch
Issue Type: Bug
Components: linkdb
Reporter: Sami
[ http://issues.apache.org/jira/browse/NUTCH-404?page=all ]
Sami Siren resolved NUTCH-404.
--
Fix Version/s: 0.9.0
Resolution: Fixed
fixed
Fix LinkDB Usage - implementation mismatch
--
Key:
[ http://issues.apache.org/jira/browse/NUTCH-403?page=all ]
Sami Siren resolved NUTCH-403.
--
Fix Version/s: 0.9.0
Resolution: Fixed
Committed to trunk with change to name of conf parameter.
Make URL filtering optional in Generator
[ http://issues.apache.org/jira/browse/NUTCH-403?page=all ]
Sami Siren updated NUTCH-403:
-
The command that is altered is generate (Generator) not crawl.
Make URL filtering optional in Generator
Key:
[ http://issues.apache.org/jira/browse/NUTCH-388?page=all ]
Sami Siren resolved NUTCH-388.
--
Fix Version/s: 0.9.0
Resolution: Fixed
This is now fixed (rev 476617). Thanks for reporting it!
nutch-default.xml has outdated example for urlfilter.order
[ http://issues.apache.org/jira/browse/NUTCH-395?page=all ]
Sami Siren resolved NUTCH-395.
--
Fix Version/s: 0.9.0
Resolution: Fixed
applied to trunk with some additional whitespace changes.
Increase fetching speed
---
[ http://issues.apache.org/jira/browse/NUTCH-395?page=all ]
Sami Siren updated NUTCH-395:
-
Attachment: NUTCH-395-trunk-metadata-only-2.patch
Additional change to Content cuts down time needed in effective fetching. Now
seeing speeds like 45 pages/sec also
[ http://issues.apache.org/jira/browse/NUTCH-395?page=all ]
Sami Siren updated NUTCH-395:
-
Attachment: NUTCH-395-trunk-metadata-only.patch
Here's a first stab at svn trunk version of nutch that just optimizes the use
of metadata and splits it into two
[ http://issues.apache.org/jira/browse/NUTCH-395?page=all ]
Sami Siren updated NUTCH-395:
-
Affects Version/s: 0.9.0
Increase fetching speed
---
Key: NUTCH-395
URL:
[
http://issues.apache.org/jira/browse/NUTCH-398?page=comments#action_12448949 ]
Sami Siren commented on NUTCH-398:
--
Did anyone try to use single machine but not with local mode but with nutch
acting like one node? Maybe this is workaround
Change CommandRunner to use concurrent api from jdk
---
Key: NUTCH-399
URL: http://issues.apache.org/jira/browse/NUTCH-399
Project: Nutch
Issue Type: Task
Reporter: Sami Siren
[ http://issues.apache.org/jira/browse/NUTCH-399?page=all ]
Sami Siren resolved NUTCH-399.
--
Fix Version/s: 0.9.0
Resolution: Fixed
Change CommandRunner to use concurrent api from jdk
---
Update add missing license headers
Key: NUTCH-400
URL: http://issues.apache.org/jira/browse/NUTCH-400
Project: Nutch
Issue Type: Task
Affects Versions: 0.8.2, 0.9.0
Reporter: Sami Siren
[ http://issues.apache.org/jira/browse/NUTCH-400?page=all ]
Sami Siren updated NUTCH-400:
-
Fix Version/s: 0.8.2
0.9.0
Update add missing license headers
Key: NUTCH-400
[
http://issues.apache.org/jira/browse/NUTCH-395?page=comments#action_12448795 ]
Sami Siren commented on NUTCH-395:
--
have you measured what made the biggest impact on performance - changes to
Metadata, or
changes to IO in FetcherOutput?
did
[
http://issues.apache.org/jira/browse/NUTCH-395?page=comments#action_12445956 ]
Sami Siren commented on NUTCH-395:
--
have you measured what made the biggest impact on performance - changes to
Metadata, or
changes to IO in FetcherOutput?
did
Increase fetching speed
---
Key: NUTCH-395
URL: http://issues.apache.org/jira/browse/NUTCH-395
Project: Nutch
Issue Type: Improvement
Components: fetcher
Affects Versions: 0.8.1
Reporter: Sami
[ http://issues.apache.org/jira/browse/NUTCH-395?page=all ]
Sami Siren updated NUTCH-395:
-
Attachment: nutch-0.8-performance.txt
a rough patch for testing purposes
Increase fetching speed
---
Key: NUTCH-395
ParseUtil logs file contents to log file when it cannot find parser
---
Key: NUTCH-391
URL: http://issues.apache.org/jira/browse/NUTCH-391
Project: Nutch
Issue Type: Bug
[ http://issues.apache.org/jira/browse/NUTCH-391?page=all ]
Sami Siren resolved NUTCH-391.
--
Resolution: Fixed
ParseUtil logs file contents to log file when it cannot find parser
---
[ http://issues.apache.org/jira/browse/NUTCH-379?page=all ]
Sami Siren resolved NUTCH-379.
--
Resolution: Fixed
Committed this to 0.8(.x) branch and trunk. Thanks Chris.
ParseUtil does not pass through the content's URL to the ParserFactory
[ http://issues.apache.org/jira/browse/NUTCH-52?page=all ]
Sami Siren closed NUTCH-52.
---
Parser plugin for MS Excel files
Key: NUTCH-52
URL: http://issues.apache.org/jira/browse/NUTCH-52
[ http://issues.apache.org/jira/browse/NUTCH-53?page=all ]
Sami Siren closed NUTCH-53.
---
Parser plugin for Zip files
---
Key: NUTCH-53
URL: http://issues.apache.org/jira/browse/NUTCH-53
[ http://issues.apache.org/jira/browse/NUTCH-81?page=all ]
Sami Siren closed NUTCH-81.
---
Webapp only works when deployed in root
---
Key: NUTCH-81
URL:
[ http://issues.apache.org/jira/browse/NUTCH-88?page=all ]
Sami Siren closed NUTCH-88.
---
Enhance ParserFactory plugin selection policy
-
Key: NUTCH-88
URL:
[ http://issues.apache.org/jira/browse/NUTCH-102?page=all ]
Sami Siren closed NUTCH-102.
jobtracker does not start when webapps is in src
Key: NUTCH-102
URL:
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ]
Sami Siren closed NUTCH-110.
OpenSearchServlet outputs illegal xml characters
Key: NUTCH-110
URL:
[ http://issues.apache.org/jira/browse/NUTCH-116?page=all ]
Sami Siren closed NUTCH-116.
TestNDFS a JUnit test specifically for NDFS
---
Key: NUTCH-116
URL:
[ http://issues.apache.org/jira/browse/NUTCH-114?page=all ]
Sami Siren closed NUTCH-114.
getting number of urls and links from crawldb
-
Key: NUTCH-114
URL:
[ http://issues.apache.org/jira/browse/NUTCH-108?page=all ]
Sami Siren closed NUTCH-108.
tasktracker crashs when reconnecting to a new jobtracker.
-
Key: NUTCH-108
URL:
[ http://issues.apache.org/jira/browse/NUTCH-130?page=all ]
Sami Siren closed NUTCH-130.
Be explicit about target JVM when building (1.4.x?)
---
Key: NUTCH-130
URL:
[ http://issues.apache.org/jira/browse/NUTCH-131?page=all ]
Sami Siren closed NUTCH-131.
Non-documented variable: mapred.child.heap.size
---
Key: NUTCH-131
URL:
[ http://issues.apache.org/jira/browse/NUTCH-118?page=all ]
Sami Siren closed NUTCH-118.
FAQ link points to invalid URL
--
Key: NUTCH-118
URL: http://issues.apache.org/jira/browse/NUTCH-118
[ http://issues.apache.org/jira/browse/NUTCH-124?page=all ]
Sami Siren closed NUTCH-124.
protocol-httpclient does not follow redirects when fetching robots.txt
--
Key:
[ http://issues.apache.org/jira/browse/NUTCH-135?page=all ]
Sami Siren closed NUTCH-135.
http header meta data are case insensitive in the real world (e.g.
Content-Type or content-type)
[ http://issues.apache.org/jira/browse/NUTCH-134?page=all ]
Sami Siren closed NUTCH-134.
Summarizer doesn't select the best snippets
---
Key: NUTCH-134
URL:
[ http://issues.apache.org/jira/browse/NUTCH-139?page=all ]
Sami Siren closed NUTCH-139.
Standard metadata property names in the ParseData metadata
--
Key: NUTCH-139
[ http://issues.apache.org/jira/browse/NUTCH-146?page=all ]
Sami Siren closed NUTCH-146.
mapred.job.tracker.info.port is defined 2 times in the nutch-default.xml
Key:
[ http://issues.apache.org/jira/browse/NUTCH-145?page=all ]
Sami Siren closed NUTCH-145.
build of war file fails on Chinese (zh) .xml files due to UTF-8 BOM
---
Key: NUTCH-145
[ http://issues.apache.org/jira/browse/NUTCH-166?page=all ]
Sami Siren closed NUTCH-166.
secure jobtracker info pages with a password
Key: NUTCH-166
URL:
101 - 200 of 329 matches
Mail list logo