Hi, Yuanchi, Your configuration looks good to me. Can you share the container logs from 0.9 container and 0.10 container?
Also, have you tried to run checkpoint-tool.sh to read from the checkpoint topic to see what's the content in the topic? Thanks! -Yi On Tue, Mar 22, 2016 at 1:48 PM, Yuanchi Ning <ningyuanchi...@gmail.com> wrote: > Hi Yi, > > Thanks for the help. Below are the checkpoint related configs: > > ##################### Job config ##################### > > job.factory.class=org.apache.samza.job.yarn.YarnJobFactory > > job.name=trip-counter > > job.datacenter=sjc1 > > job.environment=sandbox > > #job.coordinator.system=kafka #comment out in 0.9, uncomment in 0.10 > > #job.coordinator.replication.factor=3 #comment out in 0.9, uncomment in > 0.10 > > > ##################### Task config ##################### > > task.class=com.uber.athena.TripCounterTask > > task.inputs=kafka.trip_details,kafka.hp-api-client_signups > > task.outputTripTopic=trip_count_details > > task.outputClientSignUpsTopic=client_sign_ups_count_details > > task.checkpoint.factory= > org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory > > task.checkpoint.system=kafka > > task.checkpoint.replication.factor=3 > > > > On Tue, Mar 22, 2016 at 1:33 PM, Yi Pan <nickpa...@gmail.com> wrote: > > > Hi, Yuanchi, > > > > Did you check your configuration of task.checkpoint.system? What are the > > config value you used in 0.9 and what's the current configuration in > 0.10? > > If you can share your config before and after the upgrade, + the > container > > log from 0.10, we can be more helpful. > > > > Thanks! > > > > -Yi > > > > On Tue, Mar 22, 2016 at 1:19 PM, Yuanchi Ning <ningyuanchi...@gmail.com> > > wrote: > > > > > Hi All, > > > > > > When we test upgrading our existing Samza job from 0.9 to 0.10, we saw > > our > > > Kafka Lag metric (KafkaSystemConsumerMetrics > > > "messages-behind-high-watermark > > > ") kept zero. > > > Since we stopped the old job for a while and then restart the job with > > 0.10 > > > using the same name, the lag should at least spike at the beginning. In > > the > > > application master we did see it's picking up the same checkpoint topic > > > though. > > > Any ideas? thanks! > > > > > > Yuanchi > > > > > > > > > -- > > > Yuanchi Ning > > > > > > > > > -- > Yuanchi Ning > > Master of Information Technology > Very Large Information System > School of Computer Science > Carnegie Mellon University > > Mobile: (412)680-9774 > Email: ningyuanchi...@gmail.com > yuanc...@cs.cmu.edu > yuanc...@andrew.cmu.edu >