One more: how long did you stop the job before re-starting it after upgrade? Is your checkpoint topic configured to be logcompact topic or time-retention topic?
-Yi On Wed, Mar 23, 2016 at 3:54 PM, Yi Pan <nickpa...@gmail.com> wrote: > Hi, Yuanchi, > > Your configuration looks good to me. Can you share the container logs from > 0.9 container and 0.10 container? > > Also, have you tried to run checkpoint-tool.sh to read from the checkpoint > topic to see what's the content in the topic? > > Thanks! > > -Yi > > On Tue, Mar 22, 2016 at 1:48 PM, Yuanchi Ning <ningyuanchi...@gmail.com> > wrote: > >> Hi Yi, >> >> Thanks for the help. Below are the checkpoint related configs: >> >> ##################### Job config ##################### >> >> job.factory.class=org.apache.samza.job.yarn.YarnJobFactory >> >> job.name=trip-counter >> >> job.datacenter=sjc1 >> >> job.environment=sandbox >> >> #job.coordinator.system=kafka #comment out in 0.9, uncomment in 0.10 >> >> #job.coordinator.replication.factor=3 #comment out in 0.9, uncomment in >> 0.10 >> >> >> ##################### Task config ##################### >> >> task.class=com.uber.athena.TripCounterTask >> >> task.inputs=kafka.trip_details,kafka.hp-api-client_signups >> >> task.outputTripTopic=trip_count_details >> >> task.outputClientSignUpsTopic=client_sign_ups_count_details >> >> task.checkpoint.factory= >> org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory >> >> task.checkpoint.system=kafka >> >> task.checkpoint.replication.factor=3 >> >> >> >> On Tue, Mar 22, 2016 at 1:33 PM, Yi Pan <nickpa...@gmail.com> wrote: >> >> > Hi, Yuanchi, >> > >> > Did you check your configuration of task.checkpoint.system? What are the >> > config value you used in 0.9 and what's the current configuration in >> 0.10? >> > If you can share your config before and after the upgrade, + the >> container >> > log from 0.10, we can be more helpful. >> > >> > Thanks! >> > >> > -Yi >> > >> > On Tue, Mar 22, 2016 at 1:19 PM, Yuanchi Ning <ningyuanchi...@gmail.com >> > >> > wrote: >> > >> > > Hi All, >> > > >> > > When we test upgrading our existing Samza job from 0.9 to 0.10, we saw >> > our >> > > Kafka Lag metric (KafkaSystemConsumerMetrics >> > > "messages-behind-high-watermark >> > > ") kept zero. >> > > Since we stopped the old job for a while and then restart the job with >> > 0.10 >> > > using the same name, the lag should at least spike at the beginning. >> In >> > the >> > > application master we did see it's picking up the same checkpoint >> topic >> > > though. >> > > Any ideas? thanks! >> > > >> > > Yuanchi >> > > >> > > >> > > -- >> > > Yuanchi Ning >> > > >> > >> >> >> >> -- >> Yuanchi Ning >> >> Master of Information Technology >> Very Large Information System >> School of Computer Science >> Carnegie Mellon University >> >> Mobile: (412)680-9774 >> Email: ningyuanchi...@gmail.com >> yuanc...@cs.cmu.edu >> yuanc...@andrew.cmu.edu >> > >