Giuseppe Totaro created NUTCH-1959:
--------------------------------------

             Summary: Improving CommonCrawlFormat implementations
                 Key: NUTCH-1959
                 URL: https://issues.apache.org/jira/browse/NUTCH-1959
             Project: Nutch
          Issue Type: Improvement
    Affects Versions: 1.9
            Reporter: Giuseppe Totaro
            Priority: Minor


{{CommonCrawlFormat}} is an interface for Java classes that implement methods 
for writing data into Common Crawl format. {{AbstractCommonCrawlFormat}} is an 
abstract class that implements {{CommonCrawlFormat}} and provides abstract 
methods for "CommonCrawl formatter" classes.
You can find in attachment a PATCH that includes some improvements for 
{{CommonCrawlFormat}}-based classes;
* {{CommonCrawlFormat}} and {{AbstractCommonCrawlFormat}} now provide only the 
{{getJsonData()}} method, responsible for getting out JSON data.
* {{AbstractCommonCrawlFormat}} provides also the abstract methods that each 
subclass has to implement in order to handle JSON objects.
* {{CommonCrawlFormatSimple}} is a {{StringBuilder}}-based formatter that now 
provide also escaping of JSON string values.

This PATCH aims at providing a better interface for implementing/extending 
{{CommonCrawlFormat}} classes.

I would really appreciate your feedback.
Thanks a lot,
Giuseppe



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to