I think the disk read speed will be the limiting factor. I am not sure using actors is the best way to solve this. Akka stream will hold back if you read faster than you process. So i would take a look at that. Or just try java async file read api. Reading and processing 2gb files should only take a few seconds.
Don't over engineer this. On Sat, May 9, 2015, 01:58 Michael Frank <[email protected]> wrote: > what is the result of the log processing of a single file? is it some > aggregation or summary, or are you performing some action for each log line? > > it seems to me the most performant solution would be to not use actors at > all, but to create a dedicated dispatcher and process each log file in a > Future. in this way you maximize your caching (data/instruction/readahead) > and minimize your context switching. you also don't have to worry about > the fact that you are using synchronous I/O. if you are > summarizing/aggregating the log file, then the result of the Future is your > summary, and you can pipe that result to an actor using pipeTo(). > > this is optimizing for throughput however, not latency. in order to > balance throughput vs. latency, you might consider a bimodal approach, > where files larger than a certain threshold get processed using a > synchronous approach with Futures, and small files are processed in an > actor. you could abstract the processing into a trait and share that trait > between both approaches. > > that's just my 2 cents, however, without the benefit of much context. > > > -Michael > > > On 05/08/15 15:06, Harit Himanshu wrote: > > Hi Idar > > I just confirmed with some of our team mates that it depends upon our > customers. > > > 1. Some customers use local disk and remove logs after processing. > There are customers who use NAS based storage. None uses SSD as per my > understanding. > 2. The logs differ in size a lot. Depends on log rolling rules, this > may range from some Megabytes to few Gigabytes. > 3. The processing is not much. we decide either to ignore the > logLine(based on certain condition), encrypt certain data, and build a > format(usually JSON). > > Do you have better idea or would your recommendation differ based on this > information? > > Thank you > + Harit Himanshu > > > On Thursday, May 7, 2015 at 11:44:04 PM UTC-7, Idar Borlaug wrote: >> >> What filesystem and disks are you reading the files from? Reading a file >> in one actor is a good idea, because you can read it sequentially. Reading >> from 10 different places in the same file can be a lot slower or faster. >> MPIIO which are used in computational clusters have methods for splitting a >> file and reading one part each on different nodes. >> >> How much processing is there for each line? >> >> I would implement both alternatives and do some benchmarking. Maby a >> third would be to read the files in each LogLineProcessActor and ditch the >> FileActor. >> >> What would also be cool, is to have an async IO for reading the files. I >> have no experience with that. >> >> On Fri, May 8, 2015 at 2:23 AM Harit Himanshu <[email protected]> >> wrote: >> >>> Hello >>> >>> This is what my use case looks like >>> >>> *Use Case* >>> >>> - Given many log files in range (2MB - 2GB), I need to parse each of >>> these logs and apply some processing, generate Java POJO. >>> - For this problem, lets assume that we have just 1 log file >>> - Also, the idea is to making best use of System. Multiple cores are >>> available. >>> >>> *Alternative 1* >>> - Open file (synchronous), read each line, generate POJOs >>> >>> FileActor -> read each line -> List<POJO> >>> >>> *Pros*: simple to understand >>> *Cons*: Serial Process, not taking advantage of multiple cores in the >>> system >>> >>> *Alternative 2* >>> - Open File (synchronous), read N lines (N is configurable), pass on to >>> different actors to process >>> >>> / LogLineProcessActor >>> 1FileActor -> LogLineProcessRouter (with 10 Actors) -- LogLineProcessActor 2 >>> \ LogLineProcessActor 10 >>> >>> *Pros* Some parallelization, by using different actors to process part >>> of lines. Actors will make use of available cores in the system (? how, may >>> be?) >>> *Cons* Still Serial, because file read in serial fashion >>> >>> *Questions* >>> - is any of the above choice a good choice? >>> - Are there better alternatives? >>> >>> Please provide valuable thoughts here >>> >>> Thanks a lot >>> -- >>> >>>>>>>>>> Read the docs: http://akka.io/docs/ >>> >>>>>>>>>> Check the FAQ: >>> http://doc.akka.io/docs/akka/current/additional/faq.html >>> >>>>>>>>>> Search the archives: >>> https://groups.google.com/group/akka-user >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "Akka User List" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/akka-user. >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- > >>>>>>>>>> Read the docs: http://akka.io/docs/ > >>>>>>>>>> Check the FAQ: > http://doc.akka.io/docs/akka/current/additional/faq.html > >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user > --- > You received this message because you are subscribed to the Google Groups > "Akka User List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/akka-user. > For more options, visit https://groups.google.com/d/optout. > > > -- > >>>>>>>>>> Read the docs: http://akka.io/docs/ > >>>>>>>>>> Check the FAQ: > http://doc.akka.io/docs/akka/current/additional/faq.html > >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user > --- > You received this message because you are subscribed to the Google Groups > "Akka User List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/akka-user. > For more options, visit https://groups.google.com/d/optout. > -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
