jihoonson commented on issue #6036: use S3 as a backup storage for hdfs deep 
storage
URL: https://github.com/apache/incubator-druid/pull/6036#issuecomment-407161433
 
 
   Hi @gaodayue, thanks for the PR. I have a question.
   
   > In many organization, Hadoop and HDFS are typically used in offline data 
analysis, while Druid is targeting online data serving. Thus SLA provided by 
HDFS often can't meet the needs of Druid. 
   
   - I think, if this is the case, you might need to somehow increase write 
throughput of your HDFS or use a separate deep storage. If the first option is 
not available for you, does it make sense to use only S3 as your deep storage?
   
   For the idea of this PR, I'm not sure it is a good idea. Maybe we need to 
define the concept of backup deep storage for all deep storage types and 
support it. Maybe the primary deep storage and backup deep storage should be in 
sync automatically. 
   
   But, this PR is restricted to support it for only HDFS deep storage and 
looks to require another tool, called `restore-hdfs-segment`, to keep all 
segments to reside in HDFS. This would need additional operations which make 
Druid operation difficult. 
   
   Side comment: Kafka indexing service guarantees exactly-once data ingestion, 
and thus data loss is never expected to happen. If deep storage is not 
available, all attempts to publish segments would fail and every task should 
restart from the same offset when publishing failed. This needs reprocessing 
the same data which can make the ingestion slow, but there should be no data 
loss or data duplication. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to