On Thu, Nov 6, 2014 at 7:19 PM, Srinath Perera <[email protected]> wrote:
> Ah sorry, I misunderstood. > > Buffering to memory and writing to HDFS will be faster. By writing to > disk, you reduce a probability of losing that data by making it bit slower. > > However, if you are running two receivers, probability you will loose data > is less anyway. So I guess buffer in memory and writing to HDFS would be > OK. > Great!, yeah, true. In either approach, and even now, there's anyway a high probability of losing some events in the case of a failure of the server, because, most often there will be few events in the publisher queue, other in-memory buffers, the OS I/O buffers for the file scenario etc.. To be totally reliable, we will have to use a transport like JMS to archive that. Cheers, Anjana. > --Srinath > > > > > > > > > On Fri, Nov 7, 2014 at 8:24 AM, Anjana Fernando <[email protected]> wrote: > >> Hi Srinath, >> >> I think that example is a bit flawed :) .. I didn't mean to compare >> Cassandra with the HDFS case here, I know Cassandra is far more complicated >> than the HDFS operations, where the data operations in HDFS is very simple, >> and I've a feeling, that with that much small events, it may have turned >> into an CPU bound operation rather than I/O bound, because of the >> processing required for each event (maybe their batch impl. is crappy), >> that maybe why even the bigger batch is also slow. OS level buffers you >> said, yeah, so they efficiently batch the physical disk writes, in the >> memory, and flush it out later. But that's a different thing, here, we are >> just writing to the disk and reading it back again, so as I see, we are >> just using the local disk as a buffer, where we could just do this in the >> RAM. Basically, build up sizable chunks in memory, and write to HDFS. So we >> lose the, even though comparably little, overhead of writing and reading to >> the local disk, where still, the bottleneck would be writing the data out >> of the network, to a remote server's disk somewhere. Simply put, this >> direct HDFS operation should be able to saturate the network link we have, >> even if we can't, we can ask ourself, how can writing it to the local disk >> and reading it again, optimize it more. >> >> Cheers, >> Anjana. >> >> On Thu, Nov 6, 2014 at 6:15 PM, Srinath Perera <[email protected]> wrote: >> >>> Of course we need to try it out and verify, I am just making a case that >>> we should try it out :) >>> >>> Also, RDBMS should be default as most scenarios can be handled with DBs >>> and those is no reason to make everyone's life complicated. >>> >>> --Srinath >>> >>> On Fri, Nov 7, 2014 at 7:44 AM, Srinath Perera <[email protected]> wrote: >>> >>>> 1) Anjana you assuming the bandwidth is the bottleneck. Let me give an >>>> example. >>>> >>>> With sequential reads and writes, a HDD can do > 100MB/sec and 1G >>>> network can do > 50 MB/sec >>>> But BAM best number we have seen is about 40k event/sec (that with 4 >>>> machines or so, lets assume one machine). Lets assume 20 bytes events. Then >>>> it will be doing <1MB/sec. >>>> >>>> Problem is Cassandra break data to lot of small operations losing OS >>>> level buffer to buffer transfers files transfers can do. I have tried >>>> increasing batch size for cassandra, which help with smaller batches. But >>>> after about few thousand operations in the same batch, things start get >>>> much slower. >>>> >>>> Best numbers will come when we run two receivers instead of NFS. >>>> >>>> 2) Frank, this is analytics data. So it is read only and most cases we >>>> need only time based queries with less resolution (15min smallest >>>> resolution is fine for most case). This to say run this batch query on last >>>> hour of data so on. >>>> >>>> However, we have some scenarios where we do Adhoc queries for things >>>> like activity monitoring. Those would not work for those and we will have >>>> to run a batch job to push that data to RDBMS or Solar etc. Anjana, we need >>>> to discuss this. >>>> >>>> But also there are lot of usecases to receive and write the event to >>>> disk as soon as possible and later run MapReduce on top them. For those >>>> above will work. >>>> >>>> --Srinath >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Nov 7, 2014 at 7:23 AM, Anjana Fernando <[email protected]> >>>> wrote: >>>> >>>>> Hi Sanjiva, >>>>> >>>>> On Thu, Nov 6, 2014 at 4:01 PM, Sanjiva Weerawarana <[email protected]> >>>>> wrote: >>>>> >>>>>> Anjana I think the idea was for the file system -> HDFS upload to >>>>>> happen via a simple cron job type thing. >>>>>> >>>>> >>>>> Even so, we will be just moving the problem to another area, the >>>>> overall effort done by that hardware is still the same (writing to disk, >>>>> reading it back, write it to network). That is, even though we can goto >>>>> very a high throughput initially by writing it to the local disk at first, >>>>> later on we have to read it back and write it to HDFS via the network, >>>>> which is the slower part of our operation. So if we continue to load the >>>>> machine with an extreme throughput, you will eventually lose space in that >>>>> disk. >>>>> >>>>> Cheers, >>>>> Anjana. >>>>> >>>>> >>>>>> >>>>>> On Wed, Nov 5, 2014 at 9:19 AM, Anjana Fernando <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Srinath, >>>>>>> >>>>>>> Wouldn't it better, if we just make the batch size bigger, that is, >>>>>>> lets just have a sizable local in-memory store, something probably >>>>>>> close to >>>>>>> 64MB, which is the default HDFS block size, and only after this is >>>>>>> filled, >>>>>>> or if the receiver is idle maybe, we can flush the buffer. I was just >>>>>>> thinking, writing to the file system first itself will be expensive, >>>>>>> where >>>>>>> there are additional steps of writing all the records to the local file >>>>>>> system and again reading it back, and then finally writing it to HDFS, >>>>>>> and >>>>>>> of course, again having a network file system would be an overhead, and >>>>>>> not >>>>>>> to mention the implementation/configuration complications that will come >>>>>>> with this. IMHO, we should try to make these scenarios as simple as >>>>>>> possible. >>>>>>> >>>>>>> I'm doing our new BAM data layer implementations here [1], where I'm >>>>>>> almost done with an RDBMS implementation, doing some refactoring now >>>>>>> (mail >>>>>>> on this yet to come :)), I can also do an HDFS one after that and check >>>>>>> it. >>>>>>> >>>>>>> [1] >>>>>>> https://github.com/wso2/carbon-analytics/tree/master/components/xanalytics >>>>>>> >>>>>>> Cheers, >>>>>>> Anjana. >>>>>>> >>>>>>> On Tue, Nov 4, 2014 at 6:56 PM, Srinath Perera <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi All, >>>>>>>> >>>>>>>> Following came out of chat with Sanjiva on a scenario involve very >>>>>>>> large number of events coming into BAM. >>>>>>>> >>>>>>>> Currently we use Cassandra to store the events and number we got >>>>>>>> out of it has not been great and Cassandra need too much attention to >>>>>>>> get >>>>>>>> to those number. >>>>>>>> >>>>>>>> With Cassandra (or any DB) we write data as records. We can batch >>>>>>>> it, but still amount of data in one IO operation is small. In >>>>>>>> comparison, >>>>>>>> file transfers are much much faster and that is fastest way to get >>>>>>>> some >>>>>>>> data from A to B. >>>>>>>> >>>>>>>> So I am proposing to write the events that comes into a local file >>>>>>>> in the Data Receiver, and periodically append them to a HDFS file. We >>>>>>>> can >>>>>>>> arrange data in a folder by stream and files by timestamp (e.g. 1h >>>>>>>> data go >>>>>>>> to a new file), so we can selectively pull and process data using >>>>>>>> Hive. (We >>>>>>>> can use something like https://github.com/OpenHFT/Chronicle-Queue >>>>>>>> to write data to disk). >>>>>>>> >>>>>>>> If user needs avoid losing any messages at all in case of a disk >>>>>>>> failure, either he can have a SAN or NTFS or can run two replicas of >>>>>>>> receivers (we should write some code so only one of the receivers will >>>>>>>> actually put data to HDFS). >>>>>>>> >>>>>>>> Coding wise, this should not be too hard. I am sure this will be >>>>>>>> factor of time faster than Cassandra (of course we need to do a PoC and >>>>>>>> verify). >>>>>>>> >>>>>>>> WDYT? >>>>>>>> >>>>>>>> --Srinath >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ============================ >>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>>>> Site: http://people.apache.org/~hemapani/ >>>>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>>>> Phone: 0772360902 >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Anjana Fernando* >>>>>>> Senior Technical Lead >>>>>>> WSO2 Inc. | http://wso2.com >>>>>>> lean . enterprise . middleware >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Sanjiva Weerawarana, Ph.D. >>>>>> Founder, Chairman & CEO; WSO2, Inc.; http://wso2.com/ >>>>>> email: [email protected]; office: (+1 650 745 4499 | +94 11 214 >>>>>> 5345) x5700; cell: +94 77 787 6880 | +1 408 466 5099; voip: +1 650 >>>>>> 265 8311 >>>>>> blog: http://sanjiva.weerawarana.org/; twitter: @sanjiva >>>>>> Lean . Enterprise . Middleware >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *Anjana Fernando* >>>>> Senior Technical Lead >>>>> WSO2 Inc. | http://wso2.com >>>>> lean . enterprise . middleware >>>>> >>>> >>>> >>>> >>>> -- >>>> ============================ >>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>> Site: http://people.apache.org/~hemapani/ >>>> Photos: http://www.flickr.com/photos/hemapani/ >>>> Phone: 0772360902 >>>> >>> >>> >>> >>> -- >>> ============================ >>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>> Site: http://people.apache.org/~hemapani/ >>> Photos: http://www.flickr.com/photos/hemapani/ >>> Phone: 0772360902 >>> >> >> >> >> -- >> *Anjana Fernando* >> Senior Technical Lead >> WSO2 Inc. | http://wso2.com >> lean . enterprise . middleware >> > > > > -- > ============================ > Blog: http://srinathsview.blogspot.com twitter:@srinath_perera > Site: http://people.apache.org/~hemapani/ > Photos: http://www.flickr.com/photos/hemapani/ > Phone: 0772360902 > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
