[
https://issues.apache.org/jira/browse/HDFS-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156007#comment-16156007
]
Anu Engineer commented on HDFS-12213:
-------------------------------------
tagging this as an "ozone merge" work item, since having an online tool makes
the ozone system work against some real world data. This is not a must *do*
work item before merge since the offline tool already provides enough workload.
So let us shoot for the best effort basis for this.
> Ozone: Corona: Support for online mode
> --------------------------------------
>
> Key: HDFS-12213
> URL: https://issues.apache.org/jira/browse/HDFS-12213
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ozone
> Reporter: Nandakumar
> Assignee: Nandakumar
> Labels: ozoneMerge, tool
>
> This jira brings support for online mode in corona.
> In online mode, common crawl data from AWS will be used to populate ozone
> with data. Default source is [CC-MAIN-2017-17/warc.paths.gz |
> https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2017-17/warc.paths.gz]
> (it contains the path to actual data segment), user can override this using
> -source.
> The following values are derived from URL of Common Crawl data
> * Domain will be used as Volume
> * URL will be used as Bucket
> * FileName will be used as Key
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]