Hi guys,

I have really strange problem with our Influxdb server. First of all we are 
running the latest version 1.0.2. We use Infuxdb to store some performance 
statistics from around 40k devices. We do this regularly every 5 minutes. The 
server in question has 40GB RAM and 16 CPU's. We keep data for 7 days and after 
that we use CQ to downsample and store it for 3 months.
Example of one measurement looks like this:

cm,mac=a2faa63e1c00,status=1 
host_if="C5/1/4/UB",sw_rev="EPC3008",model="EPC3008-v302r125531-101220c",fl_ins=5,fl_miss=256,fl_padj=15,fl_crc=0,fl_flap=37,fl_hit=19482,fl_ltime="Sep
 22 
12:44:08",status_us2="sta",cw_good_us2=157500,cw_uncorr_us2=507,cw_corr_us2=19064,tx_pwr_us2=45.00,snr_us2=31.41,rx_pwr_us2=29.00,status_us3="sta",cw_good_us3=237573,cw_uncorr_us3=2,cw_corr_us3=6909,tx_pwr_us3=45.00,snr_us3=34.18,rx_pwr_us3=29.00,cm_ip="172.16.11.15",mtc_mode=1,wideband_capable=1,prim_ds="Mo5/1/1:9",init_reason="POWER_ON",tto="6h11m",docsIfSigQUncorrectables.49=31,docsIfSigQSignalNoise.48=390,docsIfSigQCorrecteds.53=16281,docsIfSigQCorrecteds.50=16003,docsIfSigQUncorrectables.51=144,docsIfSigQUncorrectables.48=179,docsIfSigQUncorrectables.3=18,docsIfSigQSignalNoise.3=389,docsIfSigQSignalNoise.54=398,docsIfDownChannelPower.50=0,docsIfDownChannelPower.49=-6,docsIfSigQUnerroreds.51=765089373,docsIfSigQCorrecteds.3=17400,docsIfSigQCorrecteds.49=16433,docsIfSigQSignalNoise.52=399,docsIfDownChannelPower.48=-18,ifHCOutOctets.1=368789007,docsIfDownChannelPower.54=-9,docsIfSigQCorrecteds.54=16376,ifHCInOctets.1=48467216,docsIfSigQUnerroreds.52=765083145,docsIfSigQCorrecteds.48=17615,docsIfSigQSignalNoise.51=394,docsIfSigQUnerroreds.50=765097168,docsIfSigQCorrecteds.51=16009,docsIfSigQSignalNoise.53=399,docsIfDownChannelPower.53=-2,docsIfSigQUnerroreds.53=765074315,docsIfSigQSignalNoise.50=393,docsIfSigQUnerroreds.48=765110628,docsIfSigQUncorrectables.53=13,docsIfSigQUnerroreds.3=765195092,docsIfSigQUncorrectables.54=74,docsIfDownChannelPower.51=0,docsIfSigQUnerroreds.54=765068049,docsIfDownChannelPower.3=-14,docsIfSigQSignalNoise.49=394,docsIfDownChannelPower.52=1,docsIfSigQUnerroreds.49=765105625,docsIfSigQCorrecteds.52=15876,docsIfSigQUncorrectables.50=38,docsIfSigQUncorrectables.52=16
 1474563439

Now, we have two issues, one is that server restart every 24h due to OOM, look 
at this:

Sep 29 20:34:21 node1 kernel: influxd invoked oom-killer: gfp_mask=0x280da, 
order=0, oom_score_adj=0
Sep 30 20:04:15 node1 kernel: influxd invoked oom-killer: gfp_mask=0x280da, 
order=0, oom_score_adj=0
Oct  3 20:04:32 node1 kernel: influxd invoked oom-killer: gfp_mask=0x280da, 
order=0, oom_score_adj=0
Oct  4 20:04:35 node1 kernel: influxd invoked oom-killer: gfp_mask=0x200da, 
order=0, oom_score_adj=0
Oct  5 20:04:45 node1 kernel: influxd invoked oom-killer: gfp_mask=0x280da, 
order=0, oom_score_adj=0
Oct  6 20:04:46 node1 kernel: influxd invoked oom-killer: gfp_mask=0x280da, 
order=0, oom_score_adj=0

and so on. The other issue is that some data are missing. For example we see 
the whole measurement is missing, while I am pretty sure that it's written 
because at the same time when we write to influx we write to a file and we 
don't see any errors from influx. We write 1000 measurements in one batch.

I would really appreciate some help in resolving this issue since everything 
else works perfectly.

Thank you.

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to influxdb+unsubscr...@googlegroups.com.
To post to this group, send email to influxdb@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/70f0f7cf-34b9-4498-9ba7-385cbfd9d6e0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to