[
https://issues.apache.org/jira/browse/NUTCH-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-414.
-------------------------------
Resolution: Won't Fix
Bulk close of legacy issues:
http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_open_legacy_issues_in_jira
> parse-mp3 plugin concatenating previous tags for text field
> -----------------------------------------------------------
>
> Key: NUTCH-414
> URL: https://issues.apache.org/jira/browse/NUTCH-414
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 0.9.0
> Environment: -
> Reporter: Brian Whitman
>
> The parse-mp3 plugin seems to be saving a state of the previous parse's text
> content. For every new mp3 file parsed, it is putting the contents of all the
> previous text fields in the plain text field for that file.
> You can see this by fetching a set of mp3s in one segment, then viewing their
> plain text in the nutch webapp. The plaintext will include the contents of
> all files fetched in that round, which makes searching fruitless.
> I made a tiny band-aid change to MP3Parser.java and MetadataCollector.java
> against the nightly. It seems to fix the problem.
> --- MP3Parser.java 2006-12-10 09:43:26.000000000 -0500
> +++ MP3Parser.java.new 2006-12-10 16:37:03.000000000 -0500
> @@ -67,7 +67,7 @@
> fos.write(raw);
> fos.close();
> MP3File mp3 = new MP3File(tmp);
> -
> + metadataCollector.clearText();
> if (mp3.hasID3v2Tag()) {
> parse = getID3v2Parse(mp3, content.getMetadata());
> } else if (mp3.hasID3v1Tag()) {
> --- MetadataCollector.java 2006-12-10 09:43:26.000000000 -0500
> +++ MetadataCollector.java.new 2006-12-10 16:37:28.000000000 -0500
> @@ -42,6 +42,10 @@
> this.conf = conf;
> }
> + public void clearText() {
> + text = "";
> + }
> +
> public void notifyProperty(String name, String value) throws
> MalformedURLException {
> if (name.equals("TIT2-Text"))
> setTitle(value);
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira