what is the result of the log processing of a single file? is it some
aggregation or summary, or are you performing some action for each log line?
it seems to me the most performant solution would be to not use actors
at all, but to create a dedicated dispatcher and process each log file
in a Future. in this way you maximize your caching
(data/instruction/readahead) and minimize your context switching. you
also don't have to worry about the fact that you are using synchronous
I/O. if you are summarizing/aggregating the log file, then the result
of the Future is your summary, and you can pipe that result to an actor
using pipeTo().
this is optimizing for throughput however, not latency. in order to
balance throughput vs. latency, you might consider a bimodal approach,
where files larger than a certain threshold get processed using a
synchronous approach with Futures, and small files are processed in an
actor. you could abstract the processing into a trait and share that
trait between both approaches.
that's just my 2 cents, however, without the benefit of much context.
-Michael
On 05/08/15 15:06, Harit Himanshu wrote:
Hi Idar
I just confirmed with some of our team mates that it depends upon our
customers.
1. Some customers use local disk and remove logs after processing.
There are customers who use NAS based storage. None uses SSD as
per my understanding.
2. The logs differ in size a lot. Depends on log rolling rules, this
may range from some Megabytes to few Gigabytes.
3. The processing is not much. we decide either to ignore the
logLine(based on certain condition), encrypt certain data, and
build a format(usually JSON).
Do you have better idea or would your recommendation differ based on
this information?
Thank you
+ Harit Himanshu
On Thursday, May 7, 2015 at 11:44:04 PM UTC-7, Idar Borlaug wrote:
What filesystem and disks are you reading the files from? Reading
a file in one actor is a good idea, because you can read it
sequentially. Reading from 10 different places in the same file
can be a lot slower or faster. MPIIO which are used in
computational clusters have methods for splitting a file and
reading one part each on different nodes.
How much processing is there for each line?
I would implement both alternatives and do some benchmarking. Maby
a third would be to read the files in each LogLineProcessActor and
ditch the FileActor.
What would also be cool, is to have an async IO for reading the
files. I have no experience with that.
On Fri, May 8, 2015 at 2:23 AM Harit Himanshu
<[email protected]> wrote:
Hello
This is what my use case looks like
*Use Case*
- Given many log files in range (2MB - 2GB), I need to parse
each of these logs and apply some processing, generate Java
|POJO|.
- For this problem, lets assume that we have just |1| log file
- Also, the idea is to making best use of System. Multiple
cores are available.
*Alternative 1*
- Open file (synchronous), read each line, generate |POJO|s
|FileActor -> read each line-> List<POJO> |
*/Pros/*: simple to understand
*/Cons/*: Serial Process, not taking advantage of multiple
cores in the system
*Alternative 2*
- Open File (synchronous), read |N| lines (|N| is
configurable), pass on to different actors to process
| /
LogLineProcessActor 1
FileActor -> LogLineProcessRouter (with10 Actors) --
LogLineProcessActor 2
\LogLineProcessActor 10|
*/Pros/* Some parallelization, by using different actors to
process part of lines. Actors will make use of available cores
in the system (? how, may be?)
*/Cons/* Still Serial, because file read in serial fashion
*Questions*
- is any of the above choice a good choice?
- Are there better alternatives?
Please provide valuable thoughts here
Thanks a lot
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ:
http://doc.akka.io/docs/akka/current/additional/faq.html
<http://doc.akka.io/docs/akka/current/additional/faq.html>
>>>>>>>>>> Search the archives:
https://groups.google.com/group/akka-user
<https://groups.google.com/group/akka-user>
---
You received this message because you are subscribed to the
Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user
<http://groups.google.com/group/akka-user>.
For more options, visit https://groups.google.com/d/optout
<https://groups.google.com/d/optout>.
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ:
http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google
Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.
To post to this group, send email to [email protected]
<mailto:[email protected]>.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.
--
Read the docs: http://akka.io/docs/
Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.