I found a place where I suppose that could go
(testdata/cluster/node_templates/cdh5/etc/kudu/), but I'm not sure
what I want to set it at. If "maximum error 197431 us" is to be
believed, it needs to be at least 0.2 seconds, and if
https://kudu.apache.org/docs/troubleshooting.html is to be believed,
it is already 10 seconds.

I added some more logging to try and see what is going on, and if I
hit this again I'll re-open the thread. For now, I am not planning to
submit a patch to "fix" it because I'm not sure increasing the number
is the real solution.

On Wed, Nov 16, 2016 at 11:00 AM, Matthew Jacobs <[email protected]> wrote:
> According to the error message, it looks like we can specify the
> '--max_clock_sync_error_usec' flag when starting the Kudu processes.
> We may want to start by printing ntptime output at the beginning of
> jobs so we can see how far off it is. If it's off by days then maybe
> changing the error isn't a good idea, and we'll need to figure out
> something else.
>
> On Wed, Nov 16, 2016 at 10:55 AM, Jim Apple <[email protected]> wrote:
>> How do we bump up the allowable error?
>>
>> On Wed, Nov 16, 2016 at 10:20 AM, Matthew Jacobs <[email protected]> wrote:
>>> I asked on the Kudu slack channel, they have seen issues where freshly
>>> provisioned ec2 nodes take some time for ntp to quiesce, but they
>>> didn't have a sense of how long that might take. If you checked
>>> ntptime after the job failed, it may be that ntp had enough time. We
>>> can probably consider bumping up the allowable error.
>>>
>>> On Wed, Nov 16, 2016 at 9:24 AM, Jim Apple <[email protected]> wrote:
>>>> This is the second time I have seen it, but it doesn't happen every
>>>> time. It could very well be a difference on ec2; already I've seen
>>>> some bugs due to my ec2 instances being Etc/UTC timezone while most
>>>> Impala developers work in America/Los_Angeles.
>>>>
>>>> On Wed, Nov 16, 2016 at 9:10 AM, Matthew Jacobs <[email protected]> wrote:
>>>>> No problem. If this happens again we should ask the Kudu developers. I
>>>>> haven't seen this before - I wonder if it could be some weirdness on
>>>>> ec2...
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Wed, Nov 16, 2016 at 9:01 AM, Jim Apple <[email protected]> wrote:
>>>>>> Thank you for your help!
>>>>>>
>>>>>> This was on an AWS machine that has expired, but I can see from the
>>>>>> logs that "IMPALA_KUDU_VERSION=88b023" and
>>>>>> "KUDU_JAVA_VERSION=1.0.0-SNAPSHOT" and "Downloading
>>>>>> kudu-python-0.3.0.tar.gz" and "URL
>>>>>> https://native-toolchain.s3.amazonaws.com/build/264-e9d44349ba/kudu/88b023-gcc-4.9.2/kudu-88b023-gcc-4.9.2-ec2-package-ubuntu-14-04.tar.gz";.
>>>>>> I'll add "ps aux | grep kudu" to the logging this machine does on
>>>>>> error, so we'll have it next time, but I did "ps -Afly" on exit and
>>>>>> there were no kudu processes running, it looks like.
>>>>>>
>>>>>> On Wed, Nov 16, 2016 at 8:52 AM, Matthew Jacobs <[email protected]> 
>>>>>> wrote:
>>>>>>> Can you check which version of the client you're building against
>>>>>>> (KUDU_VERSION env var) vs what Kudu version is running (ps aux | grep
>>>>>>> kudu
>>>>>>>
>>>>>>> On Wed, Nov 16, 2016 at 8:48 AM, Jim Apple <[email protected]> wrote:
>>>>>>>> Yes.
>>>>>>>>
>>>>>>>> On Wed, Nov 16, 2016 at 7:45 AM, Matthew Jacobs <[email protected]> 
>>>>>>>> wrote:
>>>>>>>>> Do you have NTP installed?
>>>>>>>>>
>>>>>>>>> On Tue, Nov 15, 2016 at 9:22 PM, Jim Apple <[email protected]> 
>>>>>>>>> wrote:
>>>>>>>>>> I have a machine where Kudu failed to start:
>>>>>>>>>>
>>>>>>>>>> F1116 05:02:00.173629 71098 tablet_server_main.cc:64] Check failed:
>>>>>>>>>> _s.ok() Bad status: Service unavailable: Cannot initialize clock:
>>>>>>>>>> Error reading clock. Clock considered unsynchronized
>>>>>>>>>>
>>>>>>>>>> https://kudu.apache.org/docs/troubleshooting.html says:
>>>>>>>>>>
>>>>>>>>>> "For the master and tablet server daemons, the server’s clock must be
>>>>>>>>>> synchronized using NTP. In addition, the maximum clock error (not to
>>>>>>>>>> be mistaken with the estimated error) be below a configurable
>>>>>>>>>> threshold. The default value is 10 seconds, but it can be set with 
>>>>>>>>>> the
>>>>>>>>>> flag --max_clock_sync_error_usec."
>>>>>>>>>>
>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>>>> "If NTP is installed the user can monitor the synchronization status
>>>>>>>>>> by running ntptime. The relevant value is what is reported for 
>>>>>>>>>> maximum
>>>>>>>>>> error."
>>>>>>>>>>
>>>>>>>>>> ntptime reports:
>>>>>>>>>>
>>>>>>>>>> ntp_gettime() returns code 0 (OK)
>>>>>>>>>>   time dbd66a6a.59bca948  Wed, Nov 16 2016  5:17:30.350, 
>>>>>>>>>> (.350535824),
>>>>>>>>>>   maximum error 197431 us, estimated error 71015 us, TAI offset 0
>>>>>>>>>> ntp_adjtime() returns code 0 (OK)
>>>>>>>>>>   modes 0x0 (),
>>>>>>>>>>   offset 74989.459 us, frequency 19.950 ppm, interval 1 s,
>>>>>>>>>>   maximum error 197431 us, estimated error 71015 us,
>>>>>>>>>>   status 0x2001 (PLL,NANO),
>>>>>>>>>>   time constant 6, precision 0.001 us, tolerance 500 ppm,
>>>>>>>>>>
>>>>>>>>>> So it looks like this error is anticipated, but the expected
>>>>>>>>>> conditions for it to occur are absent. Any ideas what could be going
>>>>>>>>>> on here? This is with a recent checkout of Impala master.

Reply via email to