Re: [Architecture] RFC: Doing Bulk Events Updates to HDFS instead of Cassandra

Anjana Fernando Thu, 06 Nov 2014 23:05:16 -0800

On Thu, Nov 6, 2014 at 7:19 PM, Srinath Perera <[email protected]> wrote:


> Ah sorry, I misunderstood.
>
> Buffering to memory and writing to HDFS will be faster. By writing to
> disk, you reduce a probability of losing that data by making it bit slower.
>

> However, if you are running two receivers, probability you will loose data
> is less anyway. So I guess buffer in memory and writing to HDFS would be
> OK.
>

Great!, yeah, true. In either approach, and even now, there's anyway a high
probability of losing some events in the case of a failure of the server,
because, most often there will be few events in the publisher queue, other
in-memory buffers, the OS I/O buffers for the file scenario etc.. To be
totally reliable, we will have to use a transport like JMS to archive that.

Cheers,
Anjana.


> --Srinath
>
>
>
>
>
>
>
>
> On Fri, Nov 7, 2014 at 8:24 AM, Anjana Fernando <[email protected]> wrote:
>
>> Hi Srinath,
>>
>> I think that example is a bit flawed :) .. I didn't mean to compare
>> Cassandra with the HDFS case here, I know Cassandra is far more complicated
>> than the HDFS operations, where the data operations in HDFS is very simple,
>> and I've a feeling, that with that much small events, it may have turned
>> into an CPU bound operation rather than I/O bound, because of the
>> processing required for each event (maybe their batch impl. is crappy),
>> that maybe why even the bigger batch is also slow. OS level buffers you
>> said, yeah, so they efficiently batch the physical disk writes, in the
>> memory, and flush it out later. But that's a different thing, here, we are
>> just writing to the disk and reading it back again, so as I see, we are
>> just using the local disk as a buffer, where we could just do this in the
>> RAM. Basically, build up sizable chunks in memory, and write to HDFS. So we
>> lose the, even though comparably little, overhead of writing and reading to
>> the local disk, where still, the bottleneck would be writing the data out
>> of the network, to a remote server's disk somewhere. Simply put, this
>> direct HDFS operation should be able to saturate the network link we have,
>> even if we can't, we can ask ourself, how can writing it to the local disk
>> and reading it again, optimize it more.
>>
>> Cheers,
>> Anjana.
>>
>> On Thu, Nov 6, 2014 at 6:15 PM, Srinath Perera <[email protected]> wrote:
>>
>>> Of course we need to try it out and verify, I am just making a case that
>>> we should try it out :)
>>>
>>> Also, RDBMS should be default as most scenarios can be handled with DBs
>>> and those is no reason to make everyone's life complicated.
>>>
>>> --Srinath
>>>
>>> On Fri, Nov 7, 2014 at 7:44 AM, Srinath Perera <[email protected]> wrote:
>>>
>>>> 1) Anjana you assuming the bandwidth is the bottleneck. Let me give an
>>>> example.
>>>>
>>>> With sequential reads and writes, a HDD can do > 100MB/sec and 1G
>>>> network can do > 50 MB/sec
>>>> But BAM best number we have seen is about 40k event/sec (that with 4
>>>> machines or so, lets assume one machine). Lets assume 20 bytes events. Then
>>>> it will be doing <1MB/sec.
>>>>
>>>> Problem is Cassandra break data to lot of small operations losing OS
>>>> level buffer to buffer transfers files transfers can do. I have tried
>>>> increasing batch size for cassandra, which help with smaller batches. But
>>>> after about few thousand operations in the same batch, things start get
>>>> much slower.
>>>>
>>>> Best numbers will come when we run two receivers instead of NFS.
>>>>
>>>> 2) Frank, this is analytics data. So it is read only and most cases we
>>>> need only time based queries with less resolution (15min smallest
>>>> resolution is fine for most case). This to say run this batch query on last
>>>> hour of data so on.
>>>>
>>>> However, we have some scenarios where we do Adhoc queries for things
>>>> like activity monitoring. Those would not work for those and we will have
>>>> to run a batch job to push that data to RDBMS or Solar etc. Anjana, we need
>>>> to discuss this.
>>>>
>>>> But also there are lot of usecases to receive and write the event to
>>>> disk as soon as possible and later run MapReduce on top them. For those
>>>> above will work.
>>>>
>>>> --Srinath
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Nov 7, 2014 at 7:23 AM, Anjana Fernando <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Sanjiva,
>>>>>
>>>>> On Thu, Nov 6, 2014 at 4:01 PM, Sanjiva Weerawarana <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Anjana I think the idea was for the file system -> HDFS upload to
>>>>>> happen via a simple cron job type thing.
>>>>>>
>>>>>
>>>>> Even so, we will be just moving the problem to another area, the
>>>>> overall effort done by that hardware is still the same (writing to disk,
>>>>> reading it back, write it to network). That is, even though we can goto
>>>>> very a high throughput initially by writing it to the local disk at first,
>>>>> later on we have to read it back and write it to HDFS via the network,
>>>>> which is the slower part of our operation. So if we continue to load the
>>>>> machine with an extreme throughput, you will eventually lose space in that
>>>>> disk.
>>>>>
>>>>> Cheers,
>>>>> Anjana.
>>>>>
>>>>>
>>>>>>
>>>>>> On Wed, Nov 5, 2014 at 9:19 AM, Anjana Fernando <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Srinath,
>>>>>>>
>>>>>>> Wouldn't it better, if we just make the batch size bigger, that is,
>>>>>>> lets just have a sizable local in-memory store, something probably 
>>>>>>> close to
>>>>>>> 64MB, which is the default HDFS block size, and only after this is 
>>>>>>> filled,
>>>>>>> or if the receiver is idle maybe, we can flush the buffer. I was just
>>>>>>> thinking, writing to the file system first itself will be expensive, 
>>>>>>> where
>>>>>>> there are additional steps of writing all the records to the local file
>>>>>>> system and again reading it back, and then finally writing it to HDFS, 
>>>>>>> and
>>>>>>> of course, again having a network file system would be an overhead, and 
>>>>>>> not
>>>>>>> to mention the implementation/configuration complications that will come
>>>>>>> with this. IMHO, we should try to make these scenarios as simple as
>>>>>>> possible.
>>>>>>>
>>>>>>> I'm doing our new BAM data layer implementations here [1], where I'm
>>>>>>> almost done with an RDBMS implementation, doing some refactoring now 
>>>>>>> (mail
>>>>>>> on this yet to come :)), I can also do an HDFS one after that and check 
>>>>>>> it.
>>>>>>>
>>>>>>> [1]
>>>>>>> https://github.com/wso2/carbon-analytics/tree/master/components/xanalytics
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Anjana.
>>>>>>>
>>>>>>> On Tue, Nov 4, 2014 at 6:56 PM, Srinath Perera <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> Following came out of chat with Sanjiva on a scenario involve very
>>>>>>>> large number of events coming into BAM.
>>>>>>>>
>>>>>>>> Currently we use Cassandra to store the events and number we got
>>>>>>>> out of it has not been great and Cassandra need too much attention to 
>>>>>>>> get
>>>>>>>> to those number.
>>>>>>>>
>>>>>>>> With Cassandra (or any DB) we write data as records. We can batch
>>>>>>>> it, but still amount of data in one IO operation is small. In 
>>>>>>>> comparison,
>>>>>>>>  file transfers are much much faster and that is fastest way to get 
>>>>>>>> some
>>>>>>>> data from A to B.
>>>>>>>>
>>>>>>>> So I am proposing to write the events that comes into a local file
>>>>>>>> in the Data Receiver, and periodically append them to a HDFS file. We 
>>>>>>>> can
>>>>>>>> arrange data in a folder by stream and files by timestamp (e.g. 1h 
>>>>>>>> data go
>>>>>>>> to a new file), so we can selectively pull and process data using 
>>>>>>>> Hive. (We
>>>>>>>> can use something like https://github.com/OpenHFT/Chronicle-Queue
>>>>>>>> to write data to disk).
>>>>>>>>
>>>>>>>> If user needs avoid losing any messages at all in case of a disk
>>>>>>>> failure, either he can have a SAN or NTFS or can run two replicas of
>>>>>>>> receivers  (we should write some code so only one of the receivers will
>>>>>>>> actually put data to HDFS).
>>>>>>>>
>>>>>>>> Coding wise, this should not be too hard. I am sure this will be
>>>>>>>> factor of time faster than Cassandra (of course we need to do a PoC and
>>>>>>>> verify).
>>>>>>>>
>>>>>>>> WDYT?
>>>>>>>>
>>>>>>>> --Srinath
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ============================
>>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>>>> Phone: 0772360902
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Anjana Fernando*
>>>>>>> Senior Technical Lead
>>>>>>> WSO2 Inc. | http://wso2.com
>>>>>>> lean . enterprise . middleware
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sanjiva Weerawarana, Ph.D.
>>>>>> Founder, Chairman & CEO; WSO2, Inc.;  http://wso2.com/
>>>>>> email: [email protected]; office: (+1 650 745 4499 | +94  11 214
>>>>>> 5345) x5700; cell: +94 77 787 6880 | +1 408 466 5099; voip: +1 650
>>>>>> 265 8311
>>>>>> blog: http://sanjiva.weerawarana.org/; twitter: @sanjiva
>>>>>> Lean . Enterprise . Middleware
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Anjana Fernando*
>>>>> Senior Technical Lead
>>>>> WSO2 Inc. | http://wso2.com
>>>>> lean . enterprise . middleware
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ============================
>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>> Site: http://people.apache.org/~hemapani/
>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>> Phone: 0772360902
>>>>
>>>
>>>
>>>
>>> --
>>> ============================
>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>> Site: http://people.apache.org/~hemapani/
>>> Photos: http://www.flickr.com/photos/hemapani/
>>> Phone: 0772360902
>>>
>>
>>
>>
>> --
>> *Anjana Fernando*
>> Senior Technical Lead
>> WSO2 Inc. | http://wso2.com
>> lean . enterprise . middleware
>>
>
>
>
> --
> ============================
> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
> Site: http://people.apache.org/~hemapani/
> Photos: http://www.flickr.com/photos/hemapani/
> Phone: 0772360902
>



-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] RFC: Doing Bulk Events Updates to HDFS instead of Cassandra

Reply via email to