Re: [influxdb] InfluxDB restarts every 24 hours and some data is missing

Sean Beckett Thu, 13 Oct 2016 14:15:32 -0700

> Now, we have two issues, one is that server restart every 24h due to OOM,
look at this:


Does the RAM use spike every 24 hours, or does it slowly grow?

One of your tags is a MAC address. That has very high cardinality. How many
series are there in your system?
http://docs.influxdata.com/influxdb/v1.0/troubleshooting/frequently-asked-questions/#why-does-series-cardinality-matter

You also have 71 fields per point. Are you running any CQs to downsample
them?

What are the retention policy settings? (SHOW RETENTION POLICIES ON
<database>)

> The other issue is that some data are missing. For example we see the
whole measurement is missing, while I am pretty sure that it's written
because at the same time when we write to influx we write to a file and we
don't see any errors from influx. We write 1000 measurements in one batch.

Do the InfluxDB logs show successful writes? Is the client receiving a 204
response to the writes?

What does "see the whole measurement is missing" mean? Can you show actual
CLI queries? This could be syntax issues.


On Thu, Oct 13, 2016 at 8:49 AM, Pavel <[email protected]> wrote:

> Hi guys,
>
> I have really strange problem with our Influxdb server. First of all we
> are running the latest version 1.0.2. We use Infuxdb to store some
> performance statistics from around 40k devices. We do this regularly every
> 5 minutes. The server in question has 40GB RAM and 16 CPU's. We keep data
> for 7 days and after that we use CQ to downsample and store it for 3 months.
> Example of one measurement looks like this:
>
> cm,mac=a2faa63e1c00,status=1 host_if="C5/1/4/UB",sw_rev="
> EPC3008",model="EPC3008-v302r125531-101220c",fl_ins=5,
> fl_miss=256,fl_padj=15,fl_crc=0,fl_flap=37,fl_hit=19482,fl_ltime="Sep 22
> 12:44:08",status_us2="sta",cw_good_us2=157500,cw_uncorr_us2=
> 507,cw_corr_us2=19064,tx_pwr_us2=45.00,snr_us2=31.41,rx_
> pwr_us2=29.00,status_us3="sta",cw_good_us3=237573,cw_uncorr_
> us3=2,cw_corr_us3=6909,tx_pwr_us3=45.00,snr_us3=34.18,rx_
> pwr_us3=29.00,cm_ip="172.16.11.15",mtc_mode=1,wideband_
> capable=1,prim_ds="Mo5/1/1:9",init_reason="POWER_ON",tto="6h11m",
> docsIfSigQUncorrectables.49=31,docsIfSigQSignalNoise.48=
> 390,docsIfSigQCorrecteds.53=16281,docsIfSigQCorrecteds.50=16003,
> docsIfSigQUncorrectables.51=144,docsIfSigQUncorrectables.48=179,
> docsIfSigQUncorrectables.3=18,docsIfSigQSignalNoise.3=389,
> docsIfSigQSignalNoise.54=398,docsIfDownChannelPower.50=0,
> docsIfDownChannelPower.49=-6,docsIfSigQUnerroreds.51=765089373,
> docsIfSigQCorrecteds.3=17400,docsIfSigQCorrecteds.49=16433,
> docsIfSigQSignalNoise.52=399,docsIfDownChannelPower.48=-18,
> ifHCOutOctets.1=368789007,docsIfDownChannelPower.54=-9,
> docsIfSigQCorrecteds.54=16376,ifHCInOctets.1=48467216,
> docsIfSigQUnerroreds.52=765083145,docsIfSigQCorrecteds.48=17615,
> docsIfSigQSignalNoise.51=394,docsIfSigQUnerroreds.50=765097168,
> docsIfSigQCorrecteds.51=16009,docsIfSigQSignalNoise.53=399,
> docsIfDownChannelPower.53=-2,docsIfSigQUnerroreds.53=765074315,
> docsIfSigQSignalNoise.50=393,docsIfSigQUnerroreds.48=765110628,
> docsIfSigQUncorrectables.53=13,docsIfSigQUnerroreds.3=765195092,
> docsIfSigQUncorrectables.54=74,docsIfDownChannelPower.51=
> 0,docsIfSigQUnerroreds.54=765068049,docsIfDownChannelPower.3=-14,
> docsIfSigQSignalNoise.49=394,docsIfDownChannelPower.52=1,
> docsIfSigQUnerroreds.49=765105625,docsIfSigQCorrecteds.52=15876,
> docsIfSigQUncorrectables.50=38,docsIfSigQUncorrectables.52=16 1474563439
>
> Now, we have two issues, one is that server restart every 24h due to OOM,
> look at this:
>
> Sep 29 20:34:21 node1 kernel: influxd invoked oom-killer:
> gfp_mask=0x280da, order=0, oom_score_adj=0
> Sep 30 20:04:15 node1 kernel: influxd invoked oom-killer:
> gfp_mask=0x280da, order=0, oom_score_adj=0
> Oct  3 20:04:32 node1 kernel: influxd invoked oom-killer:
> gfp_mask=0x280da, order=0, oom_score_adj=0
> Oct  4 20:04:35 node1 kernel: influxd invoked oom-killer:
> gfp_mask=0x200da, order=0, oom_score_adj=0
> Oct  5 20:04:45 node1 kernel: influxd invoked oom-killer:
> gfp_mask=0x280da, order=0, oom_score_adj=0
> Oct  6 20:04:46 node1 kernel: influxd invoked oom-killer:
> gfp_mask=0x280da, order=0, oom_score_adj=0
>
> and so on. The other issue is that some data are missing. For example we
> see the whole measurement is missing, while I am pretty sure that it's
> written because at the same time when we write to influx we write to a file
> and we don't see any errors from influx. We write 1000 measurements in one
> batch.
>
> I would really appreciate some help in resolving this issue since
> everything else works perfectly.
>
> Thank you.
>
> --
> Remember to include the version number!
> ---
> You received this message because you are subscribed to the Google Groups
> "InfluxData" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/influxdb.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/influxdb/70f0f7cf-34b9-4498-9ba7-385cbfd9d6e0%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Sean Beckett
Director of Support and Professional Services
InfluxDB

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/CALGqCvMSF43KJzk_vY-4Bf2OYcubSzJrLhQZvmvAs5%2B7KhbN1g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [influxdb] InfluxDB restarts every 24 hours and some data is missing

Reply via email to