Hi Idar

I just confirmed with some of our team mates that it depends upon our 
customers.


   1. Some customers use local disk and remove logs after processing. There 
   are customers who use NAS based storage. None uses SSD as per my 
   understanding.
   2. The logs differ in size a lot. Depends on log rolling rules, this may 
   range from some Megabytes to few Gigabytes.
   3. The processing is not much. we decide either to ignore the 
   logLine(based on certain condition), encrypt certain data, and build a 
   format(usually JSON).

Do you have better idea or would your recommendation differ based on this 
information?

Thank you
+ Harit Himanshu


On Thursday, May 7, 2015 at 11:44:04 PM UTC-7, Idar Borlaug wrote:
>
> What filesystem and disks are you reading the files from? Reading a file 
> in one actor is a good idea, because you can read it sequentially. Reading 
> from 10 different places in the same file can be a lot slower or faster. 
> MPIIO which are used in computational clusters have methods for splitting a 
> file and reading one part each on different nodes.
>
> How much processing is there for each line?
>
> I would implement both alternatives and do some benchmarking. Maby a third 
> would be to read the files in each LogLineProcessActor and ditch the 
> FileActor.
>
> What would also be cool, is to have an async IO for reading the files. I 
> have no experience with that.
>
> On Fri, May 8, 2015 at 2:23 AM Harit Himanshu <[email protected] 
> <javascript:>> wrote:
>
>> Hello 
>>
>> This is what my use case looks like 
>>
>> *Use Case*
>>
>> - Given many log files in range (2MB - 2GB), I need to parse each of 
>> these logs and apply some processing, generate Java POJO.
>> - For this problem, lets assume that we have just 1 log file
>> - Also, the idea is to making best use of System. Multiple cores are 
>> available.
>>
>> *Alternative 1*
>> - Open file (synchronous), read each line, generate POJOs
>>
>> FileActor -> read each line -> List<POJO>  
>>
>> *Pros*: simple to understand
>> *Cons*: Serial Process, not taking advantage of multiple cores in the 
>> system
>>
>> *Alternative 2*
>> - Open File (synchronous), read N lines (N is configurable), pass on to 
>> different actors to process
>>
>>                                                     / LogLineProcessActor 
>> 1FileActor -> LogLineProcessRouter (with 10 Actors) -- LogLineProcessActor 2
>>                                                     \ LogLineProcessActor 10
>>
>> *Pros* Some parallelization, by using different actors to process part 
>> of lines. Actors will make use of available cores in the system (? how, may 
>> be?)
>> *Cons* Still Serial, because file read in serial fashion
>>
>> *Questions*
>> - is any of the above choice a good choice?
>> - Are there better alternatives?
>>
>> Please provide valuable thoughts here
>>
>> Thanks a lot
>>
>> -- 
>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>> >>>>>>>>>> Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Akka User List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/akka-user.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to