Re: [akka-user] Line count of each file in a given directory efficiently using akka

kk k Wed, 23 Nov 2016 09:33:59 -0800

Mike,advise even if critical ,I would be more than happy to take it as its 
more oriented towards improving the code.
Thanks for the detailed explanation on the router front and other 
points,they helped me understand a lot of finer aspects.


The FSM's,I would explore more.Thanks again.

On Wednesday, November 23, 2016 at 6:49:50 PM UTC+5:30, Mike Nielsen wrote:
>
> While I'm far from an expert, and have never deployed Akka in production 
> (so take my advice with a grain of salt, and please don't hear this as any 
> criticism of what you've done) I use routers for a few of reasons:
>
>    1. I can control the minimum and maximum numbers of them via 
>    configuration, and there are some pool configurations that are sensitive 
> to 
>    workload on the node, so you won't whack the compute node(s) too hard;
>    2. You can configure worker instances to be run remotely, which means 
>    that it's easier to scale horizontally if need be;
>    3. In my view, it's worthwhile to have explicit management of actors 
>    (as opposed to just spinning them up willy-nilly): I can foresee a risk in 
>    writing a large system where execution contexts become a mixed-bag of 
>    actors with a broad spectrum of performance characteristics.  Trolling 
>    through the code later to try to tune performance by partitioning 
> execution 
>    contexts is harder than having these designed up-front.  In your 
>    application, this may not be such a big deal, but, for reasons of personal 
>    preference, I try to avoid coding in a way that boxes me in scale-wise.  
>    Admittedly it's more work, but my thinking is that while I'm learning, I 
>    may as well learn it to the max.  Your mileage, of course, may vary.
>    4. In a larger application, it also seems to me to be a risk that 
>    business logic might become sprayed over actors in a way that makes the 
>    code a pain to maintain.  I'm not sure I have a complete answer on 
> managing 
>    that, but it seems that a central manifest of actors (in the form of 
> router 
>    configurations) in your configuration files might be of benefit.
>
> In terms of whether it's OK to have a single actor per file, that seems to 
> be a tacit design decision that says I/O throughput is going to be the 
> factor that limits your application throughput.  If that's always going to 
> be the case, then fine, but if you are going to want to get a high degree 
> of scalability, that may not be a correct assumption: if you have massive 
> files and a distributed filesystem in which I/O throughput scales with the 
> number of compute nodes, then it may be better to work out how to partition 
> work on a single file over multiple nodes.
>
> As one beginner to another, I have found a couple of features of Akka that 
> are worthwhile:
>
>    1. While not directly related to your questions, FSM's are amazingly 
>    helpful in writing correct code, and therein may lie the answer to your 
>    third query;
>    2. I found streams a little tricker to get the hang of than vanilla 
>    actors, but worth the investment of time.
>
> I hope you find these comments useful.
>
> On Wed, Nov 23, 2016 at 5:25 AM, kk k <cracke...@gmail.com <javascript:>> 
> wrote:
>
>>
>> This is my first program in akka so I wanted to know if the below program 
>> is efficient and is using the advantages of actor model.
>>
>> ----------
>>
>> The program's purpose is to scan a given directory for any files and 
>> print the number of lines in each file.
>>
>> 1. The main `Application` class will create the actor system and send a 
>> Scan message to a `FileScanner` actor.
>> 2. The `FileScanner` actor will scan the given directory, and for each 
>> file it will create a new `FileParser` actor and send a Parse message. 
>> Also, all the fileparser actors are passed the same Aggregator actor 
>> Reference.
>> 3. The `FileParser` actor will parse the given file, and for each line it 
>> will send a Line message to the Aggregator Actor.
>> 4. The `Aggregator` actor will maintain a count of the number of lines 
>> for each file in an instance hashmap and will print the line count for each 
>> "End" message it receives. Once all files are processed, it will shutdown 
>> the actor system.
>>
>> ----------
>>
>> A few points which I need help on:
>>
>> 1. There is a separate `FileParser` actor for each file. Is this fine? 
>> What is the benefit of using a router which routes to FileParser actor? 
>> Will its use only help in controlling the number of fileparser actors and 
>> also how load is distributed among these actors?
>>
>> 2. There is a single Aggregator actor which counts the number of lines 
>> for each file. It's using an instance `HashMap` and I hope this is fine. Or 
>> will a separate aggregator actor for each file improve performance?
>>
>> 3. Also, I am passing the number of files to Aggregator actor while it's 
>> created so that I can shutdown the actor system once all files are 
>> processed. If I have a separate Aggregator for each file, I'm not sure how 
>> to shutdown.
>>
>> 4. Each file is only sequentially processed i.e a `FileParser` actor is 
>> reading the file sequentially and and then invoking aggregator for each 
>> line. Is this fine or can it be improved.
>>
>> ----------
>>
>> **Application**
>>
>> /**
>> * The Application program bootstraps the actorsystem for parsing files in 
>> a
>> * given directory and finding their linecount
>> * 
>> * @author 
>> * @version 1.0
>> */
>> public class Application {
>> public void start(String directoryPath) {
>> ActorSystem actorSystem = ActorSystem.create("logProcessor");
>> ActorRef fileScanner = actorSystem.actorOf(
>> Props.create(FileScanner.class), "fileScanner");
>> fileScanner.tell(new Scan(directoryPath), ActorRef.noSender());
>> }
>>
>> public static void main(String[] args) {
>>
>> if (args.length < 1) {
>> System.out
>> .println("Usage: java -jar log-process-1.0-SNAPSHOT.jar <directorypath>");
>> System.exit(0);
>> }
>> String path = args[0];
>> Application application = new Application();
>> application.start(path);
>> }
>> }
>>
>> **FileScanner**
>>
>> /**
>> * The FileScanner program scans for files in a given directory
>> * 
>> * @author 
>> * @version 1.0
>> */
>> public class FileScanner extends UntypedActor {
>>
>> public FileScanner() {
>> }
>>
>> /**
>> * Invoked by the Actor System to scan a given directory
>> * 
>> * @param message
>> *            The message to process
>> */
>> public void onReceive(Object message) {
>> ActorRef parser;
>> if (message instanceof Scan) {
>> Scan scan = (Scan) message;
>> System.out.println("Scan directory: " + scan.getDirectory());
>>
>> // Only top level files in the directory are read.No recursion is
>> // done
>> File directory = new File(scan.getDirectory());
>> // Incase of large number of files,we need to optimize below call.
>> File[] files = directory.listFiles();
>> // Required to shutdown actorsystem after all files are processed
>> int numberOfFiles = 0;
>>
>> /*
>> * To only count the files and ignore any folders
>> */
>> for (File file : files) {
>> if (file.isFile())
>> numberOfFiles++;
>> }
>> ActorRef aggregator = getContext()
>> .actorOf(Props.create(Aggregator.class, numberOfFiles),
>> "aggregator");
>> File file;
>> for (int i = 0; i < files.length; i++) {
>> file = files[i];
>> if (!file.isFile())
>> continue;
>> System.out.println(file.getName());
>>
>> /*
>> * Use a unique identifier(counter) for actor names as file
>> * names can have special characters(ex:readme (copy).md) and
>> * hence cannot be directly used as actor names
>> * 
>> * Docs:Actor paths MUST: not start with `$`, // include only
>> * ASCII letters and can only contain these special //
>> * characters: -_.*$+:@&=,!~';.
>> */
>> parser = getContext().actorOf(
>> Props.create(FileParser.class, aggregator),
>> "parser-" + i);
>> parser.tell(new Parse(file.getAbsolutePath()), getSelf());
>> }
>> } else {
>> unhandled(message);
>> }
>> }
>> }
>>
>> **FileParser**
>>
>> /**
>> * The FileScanner program scans for files in a given directory
>> * 
>> * @author 
>> * @version 1.0
>> */
>> public class FileParser extends UntypedActor {
>>
>> /**
>> * An aggregator actor reference to send file events to.
>> */
>> private ActorRef aggregator;
>>
>> public FileParser(ActorRef aggregator) {
>> this.aggregator = aggregator;
>> }
>>
>> /**
>> * Invoked by the mailbox when it receives a thread timeslice and a message
>> * is available to it from FileScanner.It reads only text files and any
>> * other files are not handled
>> * 
>> * @param message
>> *            The message to process
>> */
>> public void onReceive(Object message) {
>> if (message instanceof Parse) {
>> Parse parseMessage = (Parse) message;
>> System.out.println("Parse File : " + parseMessage.getFilePath());
>>
>> Path file = Paths.get(parseMessage.getFilePath());
>> try {
>> BufferedReader reader = Files.newBufferedReader(file,
>> StandardCharsets.UTF_8);
>> String line = null;
>> Line lineMessage = new Line(parseMessage.getFilePath());
>> aggregator.tell(new Start(parseMessage.getFilePath()),
>> getSelf());
>> while ((line = reader.readLine()) != null) {
>> aggregator.tell(lineMessage, getSelf());
>> }
>> aggregator.tell(new End(parseMessage.getFilePath()), getSelf());
>> } catch (IOException e) {
>> e.printStackTrace();
>> }
>> } else {
>> unhandled(message);
>> }
>> }
>>
>> }
>>
>> **Aggregator**
>>
>> /**
>> * The Aggregator program counts the number of lines for each file.
>> * 
>> * @author 
>> * @version 1.0
>> */
>> public class Aggregator extends UntypedActor {
>> /**
>> * Number of files that are processed.
>> */
>> private int numberOfFiles;
>> /**
>> * Number of lines per file
>> */
>> private HashMap<String, Integer> lineCount;
>> /**
>> * A running count of processed files
>> */
>> private int count;
>>
>> public Aggregator(int numberOfFiles) {
>> this.numberOfFiles = numberOfFiles;
>> lineCount = new HashMap<>();
>> }
>>
>> /**
>> * Invoked by the mailbox when it receives a thread timeslice and a file
>> * event(start,line or end) is available from FileParser
>> * 
>> * @param message
>> *            The message to process
>> */
>> public void onReceive(Object message) {
>> if (message instanceof End) {
>> End end = (End) message;
>> count++;// we can use bigint incase of long files.
>> System.out.println("Line count of file : " + end.getFilePath()
>> + " is :" + lineCount.get(end.getFilePath()));
>> if (count >= numberOfFiles) {
>> getContext().system().shutdown();
>> }
>> } else if (message instanceof Line) {
>> Line line = (Line) message;
>> String path = line.getFilePath();
>> if (lineCount.containsKey(path)) {
>> lineCount.put(path, lineCount.get(path) + 1);
>> } else {
>> lineCount.put(path, 1);
>> }
>> } else if (message instanceof Start) {
>> // Started processing a new FIle
>> } else {
>> unhandled(message);
>> }
>> }
>>
>> }
>>
>> -- 
>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>> >>>>>>>>>> Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Akka User List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to akka-user+...@googlegroups.com <javascript:>.
>> To post to this group, send email to akka...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/akka-user.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] Line count of each file in a given directory efficiently using akka

Reply via email to