Re: [Architecture] RFC: Doing Bulk Events Updates to HDFS instead of Cassandra

Srinath Perera Thu, 06 Nov 2014 19:21:07 -0800

Ah sorry, I misunderstood.

Buffering to memory and writing to HDFS will be faster. By writing to disk,
you reduce a probability of losing that data by making it bit slower.


However, if you are running two receivers, probability you will loose data
is less anyway. So I guess buffer in memory and writing to HDFS would be
OK.

--Srinath








On Fri, Nov 7, 2014 at 8:24 AM, Anjana Fernando <[email protected]> wrote:

> Hi Srinath,
>
> I think that example is a bit flawed :) .. I didn't mean to compare
> Cassandra with the HDFS case here, I know Cassandra is far more complicated
> than the HDFS operations, where the data operations in HDFS is very simple,
> and I've a feeling, that with that much small events, it may have turned
> into an CPU bound operation rather than I/O bound, because of the
> processing required for each event (maybe their batch impl. is crappy),
> that maybe why even the bigger batch is also slow. OS level buffers you
> said, yeah, so they efficiently batch the physical disk writes, in the
> memory, and flush it out later. But that's a different thing, here, we are
> just writing to the disk and reading it back again, so as I see, we are
> just using the local disk as a buffer, where we could just do this in the
> RAM. Basically, build up sizable chunks in memory, and write to HDFS. So we
> lose the, even though comparably little, overhead of writing and reading to
> the local disk, where still, the bottleneck would be writing the data out
> of the network, to a remote server's disk somewhere. Simply put, this
> direct HDFS operation should be able to saturate the network link we have,
> even if we can't, we can ask ourself, how can writing it to the local disk
> and reading it again, optimize it more.
>
> Cheers,
> Anjana.
>
> On Thu, Nov 6, 2014 at 6:15 PM, Srinath Perera <[email protected]> wrote:
>
>> Of course we need to try it out and verify, I am just making a case that
>> we should try it out :)
>>
>> Also, RDBMS should be default as most scenarios can be handled with DBs
>> and those is no reason to make everyone's life complicated.
>>
>> --Srinath
>>
>> On Fri, Nov 7, 2014 at 7:44 AM, Srinath Perera <[email protected]> wrote:
>>
>>> 1) Anjana you assuming the bandwidth is the bottleneck. Let me give an
>>> example.
>>>
>>> With sequential reads and writes, a HDD can do > 100MB/sec and 1G
>>> network can do > 50 MB/sec
>>> But BAM best number we have seen is about 40k event/sec (that with 4
>>> machines or so, lets assume one machine). Lets assume 20 bytes events. Then
>>> it will be doing <1MB/sec.
>>>
>>> Problem is Cassandra break data to lot of small operations losing OS
>>> level buffer to buffer transfers files transfers can do. I have tried
>>> increasing batch size for cassandra, which help with smaller batches. But
>>> after about few thousand operations in the same batch, things start get
>>> much slower.
>>>
>>> Best numbers will come when we run two receivers instead of NFS.
>>>
>>> 2) Frank, this is analytics data. So it is read only and most cases we
>>> need only time based queries with less resolution (15min smallest
>>> resolution is fine for most case). This to say run this batch query on last
>>> hour of data so on.
>>>
>>> However, we have some scenarios where we do Adhoc queries for things
>>> like activity monitoring. Those would not work for those and we will have
>>> to run a batch job to push that data to RDBMS or Solar etc. Anjana, we need
>>> to discuss this.
>>>
>>> But also there are lot of usecases to receive and write the event to
>>> disk as soon as possible and later run MapReduce on top them. For those
>>> above will work.
>>>
>>> --Srinath
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Nov 7, 2014 at 7:23 AM, Anjana Fernando <[email protected]> wrote:
>>>
>>>> Hi Sanjiva,
>>>>
>>>> On Thu, Nov 6, 2014 at 4:01 PM, Sanjiva Weerawarana <[email protected]>
>>>> wrote:
>>>>
>>>>> Anjana I think the idea was for the file system -> HDFS upload to
>>>>> happen via a simple cron job type thing.
>>>>>
>>>>
>>>> Even so, we will be just moving the problem to another area, the
>>>> overall effort done by that hardware is still the same (writing to disk,
>>>> reading it back, write it to network). That is, even though we can goto
>>>> very a high throughput initially by writing it to the local disk at first,
>>>> later on we have to read it back and write it to HDFS via the network,
>>>> which is the slower part of our operation. So if we continue to load the
>>>> machine with an extreme throughput, you will eventually lose space in that
>>>> disk.
>>>>
>>>> Cheers,
>>>> Anjana.
>>>>
>>>>
>>>>>
>>>>> On Wed, Nov 5, 2014 at 9:19 AM, Anjana Fernando <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Srinath,
>>>>>>
>>>>>> Wouldn't it better, if we just make the batch size bigger, that is,
>>>>>> lets just have a sizable local in-memory store, something probably close 
>>>>>> to
>>>>>> 64MB, which is the default HDFS block size, and only after this is 
>>>>>> filled,
>>>>>> or if the receiver is idle maybe, we can flush the buffer. I was just
>>>>>> thinking, writing to the file system first itself will be expensive, 
>>>>>> where
>>>>>> there are additional steps of writing all the records to the local file
>>>>>> system and again reading it back, and then finally writing it to HDFS, 
>>>>>> and
>>>>>> of course, again having a network file system would be an overhead, and 
>>>>>> not
>>>>>> to mention the implementation/configuration complications that will come
>>>>>> with this. IMHO, we should try to make these scenarios as simple as
>>>>>> possible.
>>>>>>
>>>>>> I'm doing our new BAM data layer implementations here [1], where I'm
>>>>>> almost done with an RDBMS implementation, doing some refactoring now 
>>>>>> (mail
>>>>>> on this yet to come :)), I can also do an HDFS one after that and check 
>>>>>> it.
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/wso2/carbon-analytics/tree/master/components/xanalytics
>>>>>>
>>>>>> Cheers,
>>>>>> Anjana.
>>>>>>
>>>>>> On Tue, Nov 4, 2014 at 6:56 PM, Srinath Perera <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Following came out of chat with Sanjiva on a scenario involve very
>>>>>>> large number of events coming into BAM.
>>>>>>>
>>>>>>> Currently we use Cassandra to store the events and number we got out
>>>>>>> of it has not been great and Cassandra need too much attention to get to
>>>>>>> those number.
>>>>>>>
>>>>>>> With Cassandra (or any DB) we write data as records. We can batch
>>>>>>> it, but still amount of data in one IO operation is small. In 
>>>>>>> comparison,
>>>>>>>  file transfers are much much faster and that is fastest way to get some
>>>>>>> data from A to B.
>>>>>>>
>>>>>>> So I am proposing to write the events that comes into a local file
>>>>>>> in the Data Receiver, and periodically append them to a HDFS file. We 
>>>>>>> can
>>>>>>> arrange data in a folder by stream and files by timestamp (e.g. 1h data 
>>>>>>> go
>>>>>>> to a new file), so we can selectively pull and process data using Hive. 
>>>>>>> (We
>>>>>>> can use something like https://github.com/OpenHFT/Chronicle-Queue
>>>>>>> to write data to disk).
>>>>>>>
>>>>>>> If user needs avoid losing any messages at all in case of a disk
>>>>>>> failure, either he can have a SAN or NTFS or can run two replicas of
>>>>>>> receivers  (we should write some code so only one of the receivers will
>>>>>>> actually put data to HDFS).
>>>>>>>
>>>>>>> Coding wise, this should not be too hard. I am sure this will be
>>>>>>> factor of time faster than Cassandra (of course we need to do a PoC and
>>>>>>> verify).
>>>>>>>
>>>>>>> WDYT?
>>>>>>>
>>>>>>> --Srinath
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ============================
>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>>> Phone: 0772360902
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Anjana Fernando*
>>>>>> Senior Technical Lead
>>>>>> WSO2 Inc. | http://wso2.com
>>>>>> lean . enterprise . middleware
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sanjiva Weerawarana, Ph.D.
>>>>> Founder, Chairman & CEO; WSO2, Inc.;  http://wso2.com/
>>>>> email: [email protected]; office: (+1 650 745 4499 | +94  11 214 5345)
>>>>> x5700; cell: +94 77 787 6880 | +1 408 466 5099; voip: +1 650 265 8311
>>>>> blog: http://sanjiva.weerawarana.org/; twitter: @sanjiva
>>>>> Lean . Enterprise . Middleware
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Anjana Fernando*
>>>> Senior Technical Lead
>>>> WSO2 Inc. | http://wso2.com
>>>> lean . enterprise . middleware
>>>>
>>>
>>>
>>>
>>> --
>>> ============================
>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>> Site: http://people.apache.org/~hemapani/
>>> Photos: http://www.flickr.com/photos/hemapani/
>>> Phone: 0772360902
>>>
>>
>>
>>
>> --
>> ============================
>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>> Site: http://people.apache.org/~hemapani/
>> Photos: http://www.flickr.com/photos/hemapani/
>> Phone: 0772360902
>>
>
>
>
> --
> *Anjana Fernando*
> Senior Technical Lead
> WSO2 Inc. | http://wso2.com
> lean . enterprise . middleware
>



-- 
============================
Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
Site: http://people.apache.org/~hemapani/
Photos: http://www.flickr.com/photos/hemapani/
Phone: 0772360902

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] RFC: Doing Bulk Events Updates to HDFS instead of Cassandra

Reply via email to