> On March 21, 2014, 4:28 a.m., Chris Riccomini wrote:
> > build.gradle, line 88
> > <https://reviews.apache.org/r/19481/diff/2/?file=530753#file530753line88>
> >
> >     Should not depend on slf4j-log4j in a library. If you want log4j for 
> > the checkpointTool task, I think we can set it in classpath = ... below.
> >     
> >     Also, for simple task, we should just use slf4j-simple instead of 
> > slf4j-log4j, which just writes to STDOUT. This means we can skip the 
> > log4j.properties file below, as well.

I originally used slf4j-simple, but changed it to slf4j-log4j in v2 of this RB. 
The reason was that when you run the tool through the shell script, slf4j-log4j 
is already on the classpath, so adding slf4j-simple causes slf4j to complain 
about having multiple implementations.

Oddly enough, if I just remove this line, logging still works. Not quite sure 
why. Log4j itself seems to be brought in as a transitive dependency of zkclient 
and zookeeper, but I can't see anything depending on slf4j-log4j (when running 
`./gradlew samza-kafka:dependencies`).


> On March 21, 2014, 4:28 a.m., Chris Riccomini wrote:
> > samza-kafka/src/main/resources/checkpoint-tool-log4j.properties, line 1
> > <https://reviews.apache.org/r/19481/diff/2/?file=530755#file530755line1>
> >
> >     No need for this file if we just use slf4j-simple in the checkpointTool 
> > task.

I'd also prefer not to have to bother with the Log4j config, but see above.


> On March 21, 2014, 4:28 a.m., Chris Riccomini wrote:
> > build.gradle, line 105
> > <https://reviews.apache.org/r/19481/diff/2/?file=530753#file530753line105>
> >
> >     This is pretty useful. Two thoughts:
> >     
> >     1. I think we should move it to samza-shell.
> >     2. We should make two tasks, runJob and writeCheckpoint, so we can run 
> > both the JobRunner and the CheckpointTool.

Sure, done -- but it means we need to add samza-kafka and samza-yarn as 
dependencies of samza-shell. Are you ok with that?


> On March 21, 2014, 4:28 a.m., Chris Riccomini wrote:
> > samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala, line 44
> > <https://reviews.apache.org/r/19481/diff/2/?file=530754#file530754line44>
> >
> >     Move to samza-core's util package.

Done.


> On March 21, 2014, 4:28 a.m., Chris Riccomini wrote:
> > samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala, line 88
> > <https://reviews.apache.org/r/19481/diff/2/?file=530754#file530754line88>
> >
> >     Can we combine these two methods (parser.parse, and loadConfig into one 
> > method? It seems like we would always use these two methods back-to-back 
> > for every CommandLine. Maybe just cmdline.loadConfig(args: _*)

I did it this way in order to make subclassing and overriding of the loadConfig 
method nicer. If we passed in the args, the subclass and superclass would have 
to each parse the argument list. Like this, the argument list only gets parsed 
once.


> On March 21, 2014, 4:28 a.m., Chris Riccomini wrote:
> > samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/CheckpointTool.scala,
> >  line 34
> > <https://reviews.apache.org/r/19481/diff/2/?file=530756#file530756line34>
> >
> >     This is a great tool, but I think it not need be Kafka-specific. Can 
> > you make it generic (the way JobRunner is), and move it into samza-core's 
> > checkpoint package? This way, we can use the checkpoint tool whether we use 
> > a KafkaCheckpointManager, HDFSCheckpointManager, 
> > ZookeeperCheckpointManager, etc.

Good idea, done. The only downside is that the KafkaCheckpointManagerFactory 
and CheckpointTool now both need to call Util.getInputStreamPartitions, which 
is a bit wasteful. But given that this is going to be run so rarely, the extra 
few milliseconds of startup time won't hurt anyone.


> On March 21, 2014, 4:28 a.m., Chris Riccomini wrote:
> > samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/CheckpointTool.scala,
> >  line 107
> > <https://reviews.apache.org/r/19481/diff/2/?file=530756#file530756line107>
> >
> >     To make the checkpoint tool generic, I think you just need to use the 
> > checkpoint manager factory defined for the system in the config. Everything 
> > else here appears to be totally generic.

Concur.


> On March 21, 2014, 4:28 a.m., Chris Riccomini wrote:
> > samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManager.scala,
> >  line 59
> > <https://reviews.apache.org/r/19481/diff/2/?file=530757#file530757line59>
> >
> >     Shouldn't need any of these to be vals if you convert the 
> > CheckpointTool to support arbitrary CheckpointManagers.

Yup, I've undone this change.


> On March 21, 2014, 4:28 a.m., Chris Riccomini wrote:
> > samza-shell/src/main/bash/checkpoint-tool.sh, line 19
> > <https://reviews.apache.org/r/19481/diff/2/?file=530758#file530758line19>
> >
> >     No need for log4j setting here. run-class.sh will auto-add it if it 
> > exists in the lib directory of a package. If it doesn't, it's up to the 
> > user to add an appropriate slf4j module (e.g. slf4j-simple, slf4j-avsl, 
> > etc), and configure it with SAMZA_OPTS, accordingly.

At the moment, in hello-samza, log4j is configured to send all logs to a file 
(which by default is called deploy/samza/undefined-samza-container-name.log). 
If the tool isn't working for some reason, that is not where I would expect to 
have to look for further information. Since utilities like checkpoint-tool are 
meant for interactive use, I'm keen for them to write their logs to stdout or 
stderr.

Thus I think it would be best to have a different logging config for 
command-line utilities than for containers. How do you think we should do that? 
Perhaps have two different log4j configs, and each script that calls 
run-class.sh should indicate whether it's interactive or daemon?


- Martin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19481/#review38050
-----------------------------------------------------------


On March 20, 2014, 11:04 p.m., Martin Kleppmann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19481/
> -----------------------------------------------------------
> 
> (Updated March 20, 2014, 11:04 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> SAMZA-180: Command-line tool for manipulating checkpoints
> 
> 
> Diffs
> -----
> 
>   build.gradle 8e369b83b7c4a658e1a3660efc92a24efadc9fc1 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 
> f3a75afa96a8dc64c98f37fdb88c63075ac2374b 
>   samza-kafka/src/main/resources/checkpoint-tool-log4j.properties 
> PRE-CREATION 
>   
> samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/CheckpointTool.scala
>  PRE-CREATION 
>   
> samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManager.scala
>  27b38b25dc6c34f3ef76d400370d1c857834e6a2 
>   samza-shell/src/main/bash/checkpoint-tool.sh PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19481/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Martin Kleppmann
> 
>

Reply via email to