@Robert -I am not sure if the RocksDB problems are closely related to the version upgrade, I have been experiencing similar problems for months. This is usually not a huge problem on YARN I think, it mostly hurts in standalone clusters. -Also the yarn memory limits are tricky to configure nicely as it depends a lot on how rocks handles native memory. It seems to grow quite a lot over time.
Flavio Pompermaier <[email protected]> ezt írta (időpont: 2016. dec. 16., P, 10:56): > I personally think that it should be quite important to have a fix also for > the ES connector (https://issues.apache.org/jira/browse/FLINK-5122). > > Best, > Flavio > > On Fri, Dec 16, 2016 at 10:43 AM, Robert Metzger <[email protected]> > wrote: > > > I'm not sure if we can release the release candidate like this, because > I'm > > running into two issues probably related to a recent rocksdb version > > upgrade. > > > > This is my list of points so far: > > > > - Checked the staging repository. Quickstarts and Hadoop 1 / 2 are okay. > > - Build a job against the staging repository > > - Binaries deploy on a kerberized HA YARN / HDFS setup. Ran the KMeans > and > > WordCount batch jobs > > - Executed a heavy, misbehaved streaming job for a few hours. While > running > > that job, I found that: > > - Not all checkpoint directories are cleaned up in HDFS (I use the > async > > rocksdb statebackend) > > - segfaults from rocksdb (8 segfaults in ~3 hrs, but they were all > > happening in the last minutes) > > - "beyond physical memory limits" container killings from YARN (I know > we > > can configure this, I just wonder what if we should change the default > > value) > > - the segfaults and memory limits caused the job to not run anymore in > > the end because it was in a constant retry loop. > > - This is not a blocking issue I found during the testing: > > https://issues.apache.org/jira/browse/FLINK-5345 > > - This is also a non blocking issue for 1.1.4 (fixed for 1.2) > > https://issues.apache.org/jira/browse/FLINK-4631 > > > > > > Let me know if we should release anyways or fix these issues first. > > > > > > On Tue, Dec 13, 2016 at 11:04 PM, Ufuk Celebi <[email protected]> wrote: > > > > > Dear Flink community, > > > > > > Please vote on releasing the following candidate as Apache Flink > version > > > 1.1.4. > > > > > > The commit to be voted on: > > > 2cd6579 (http://git-wip-us.apache.org/repos/asf/flink/commit/2cd6579) > > > > > > Branch: > > > release-1.1.4-rc3 > > > (https://git1-us-west.apache.org/repos/asf/flink/repo?p=flin > > > k.git;a=shortlog;h=refs/heads/release-1.1.4-rc3) > > > > > > The release artifacts to be voted on can be found at: > > > http://people.apache.org/~uce/flink-1.1.4-rc3/ > > > > > > The release artifacts are signed with the key with fingerprint > 9D403309: > > > http://www.apache.org/dist/flink/KEYS > > > > > > The staging repository for this release can be found at: > > > https://repository.apache.org/content/repositories/orgapacheflink-1109 > > > > > > ------------------------------------------------------------- > > > > > > The voting time is at least three days and the vote passes if a > > > majority of at least three +1 PMC votes are cast. The vote ends > earliest > > > on Friday, December 16th, 2016, at 11 PM (CET)/2 PM (PST). > > > > > > [ ] +1 Release this package as Apache Flink 1.1.4 > > > [ ] -1 Do not release this package, because ... > > > > > >
