Re: Few basic Qs to put Impala in a prod-cluster with other OS components

2023-02-20 Thread Cliff Resnick
uild working. > > Are you suggesting using docker for production instead? I found some > license-related challenges with docker and had to see what would be the > cost implications. > > ~JAK > > On Mon, Feb 20, 2023 at 7:37 AM Cliff Resnick wrote: > >> The answer to

Re: Few basic Qs to put Impala in a prod-cluster with other OS components

2023-02-19 Thread Cliff Resnick
The answer to all three of your questions is yes. We build docker containers from the Apache source and use those in production. On Thu, Feb 16, 2023, 4:16 AM Arun J wrote: > Team, > > Looking to migrate from CDH 6.3 to the OpenSource stack. Managed to make > Impala 4.2 work with the help of

[jira] [Comment Edited] (FLINK-25672) FileSource enumerator remembers paths of all already processed files which can result in large state

2023-01-23 Thread Cliff Resnick (Jira)
[ https://issues.apache.org/jira/browse/FLINK-25672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679874#comment-17679874 ] Cliff Resnick edited comment on FLINK-25672 at 1/23/23 4:13 PM: I

[jira] [Commented] (FLINK-25672) FileSource enumerator remembers paths of all already processed files which can result in large state

2023-01-23 Thread Cliff Resnick (Jira)
[ https://issues.apache.org/jira/browse/FLINK-25672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679874#comment-17679874 ] Cliff Resnick commented on FLINK-25672: --- I imagine it may require a breaking change from what I

[jira] [Commented] (FLINK-25672) FileSource enumerator remembers paths of all already processed files which can result in large state

2023-01-23 Thread Cliff Resnick (Jira)
[ https://issues.apache.org/jira/browse/FLINK-25672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679868#comment-17679868 ] Cliff Resnick commented on FLINK-25672: --- This issue has turned into a real problem for us with our

Custom metrics in Stateful Functions

2021-04-27 Thread Cliff Resnick
We think Embedded Statefun is a nicer fit than Datastream for some problem domains, but one thing we miss is support for custom metrics/counters. Is there a way to access the Flink support? It looks like if we want custom metrics we'll need to roll our own.

Re: Partitioning Rules of Thumb

2020-04-25 Thread Cliff Resnick
. On Sat, Apr 25, 2020, 9:10 PM Cliff Resnick wrote: > That's pretty much it. Every now and then we get notice an instance is > scheduled for retirement, or maybe just goes flakey on its own, and we swap > it out. 3x replication has been plenty for us, and though we regularly back > up

Re: Partitioning Rules of Thumb

2020-04-25 Thread Cliff Resnick
nted stateless Impala Spot Fleet clusters for HLL and other >> compute oriented queries. " >> >> I cannot really find anything on the web that would compare Impala/Kudu >> to Snowflake and Redshift. Everything I see is about Snowflake, Redshift >> and BigQ

KeyedStream and chained forward operators

2020-04-21 Thread Cliff Resnick
I'm running a massive file sifting by timestamp DataSteam job from s3. The basic job is: FileMonitor -> ContinuousFileReader -> MultipleFileOutputSink The MultipleFileOutputSink sifts data based on timestamp to date-hour directories It's a lot of data, so I'm using high parallelism, but I want

post-checkpoint watermark out of sync with event stream?

2020-04-14 Thread Cliff Resnick
We have an event-time pipeline that uses a ProcessFunction to accept events with an allowed lateness of a number of days. We a BoundedOutOfOrdernessTimestampExtractor and our event stream has a long tail that occasionally exceeds our allowed lateness, in which case we drop the events. The logic

Re: Partitioning Rules of Thumb

2020-03-16 Thread Cliff Resnick
the scan > perf (including if the runtime filters pushed into the scans were filtering > most data before joins). > > On Mon, Mar 16, 2020 at 2:58 PM Boris Tyukin > wrote: > >> appreciate your thoughts, Cliff >> >> On Mon, Mar 16, 2020 at 11:18 AM Cliff Resni

Re: Partitioning Rules of Thumb

2020-03-16 Thread Cliff Resnick
ke Redshift and Snowflake. It is getting harder to explain >>>>> why :) >>>>> >>>>> My biggest gripe with Kudu besides known limitations is the management >>>>> of partitions and compaction process. For our largest tables, we just max >

Re: Partitioning Rules of Thumb

2020-03-10 Thread Cliff Resnick
This is a good conversation but I don't think the comparison with Snowflake is a fair one, at least from an older version of Snowflake (In my last job, about 5 years ago, I pretty much single-handedly scale tested Snowflake in exchange for a sweetheart pricing deal) . Though Snowflake is closed

Re: GC overhead limit exceeded, memory full of DeleteOnExit hooks for S3a files

2020-01-27 Thread Cliff Resnick
I know from experience that Flink's shaded S3A FileSystem does not reference core-site.xml, though I don't remember offhand what file (s) it does reference. However since it's shaded, maybe this could be fixed by building a Flink FS referencing 3.3.0? Last I checked I think it referenced 3.1.0.

Re: impala 3.3 and hive metastore

2020-01-21 Thread Cliff Resnick
e *remote *metastore host URIs? > The logs tell that your impalad doesn't find a hive-site.xml, and the > weird errors could be the result of misconfiguration regarding catalog > and statestore. > > HTH! > > On Tue, Jan 14, 2020 at 5:41 PM Cliff Resnick wrote: > > &g

Re: impala 3.3 and hive metastore

2020-01-14 Thread Cliff Resnick
l hive metastore db on the impalad instance, not just > the catalogd instance.' - this sounds weird and is unexpected, and > multiple metastores can easily lead to issues further down the line. > What was the end of the log when the impalad failed to start? > > On Tue, Jan 14,

impala 3.3 and hive metastore

2020-01-13 Thread Cliff Resnick
I just built Impala 3.3 from source with Kudu 1.11. Unlike previous versions, in 3.3 impalad would not start unless installed a local hive metastore db on the impalad instance, not just the catalogd instance. I'm now getting a strange "IllegalArgumentException:null" error when creating kudu tables

[jira] [Commented] (IMPALA-5323) Support Kudu BINARY

2019-12-18 Thread Cliff Resnick (Jira)
[ https://issues.apache.org/jira/browse/IMPALA-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999373#comment-16999373 ] Cliff Resnick commented on IMPALA-5323: --- We store HLL intermediates in Kudu, and UTF-8 encoding

Re: Support for ISO-8859-1 strings with Java client

2019-12-17 Thread Cliff Resnick
ely, there doesn't appear to have been > any progress on IMPALA-5323, which would be the clearest path forward. > Maybe you could update that ticket with your use case and hopefully > get the attention of some Impala developers? > > On Mon, Dec 16, 2019 at 10:16 AM Cliff Resnick wrote: >

Re: Impala and kudu without HDFS

2019-09-10 Thread Cliff Resnick
We use a large Impala/Kudu cluster for our analytic reporting. HDFS is not on the critical path for this, and after experimenting with a cluster configuration without it, we simply added a co-located HDFS cluster. It turns out we use HDFS for our dimension staging, and I'm guessing Impala uses it

[jira] [Comment Edited] (FLINK-11947) Support MapState value schema evolution for RocksDB

2019-06-05 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-11947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856993#comment-16856993 ] Cliff Resnick edited comment on FLINK-11947 at 6/5/19 7:49 PM: --- [~klion26

[jira] [Commented] (FLINK-11947) Support MapState value schema evolution for RocksDB

2019-06-05 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-11947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856993#comment-16856993 ] Cliff Resnick commented on FLINK-11947: --- [~klion26] confirmed working! Sorry it took me so lone

[jira] [Commented] (FLINK-11947) Support MapState value schema evolution for RocksDB

2019-06-03 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-11947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854459#comment-16854459 ] Cliff Resnick commented on FLINK-11947: --- I was not able to get to it on Friday, but will give

[jira] [Commented] (FLINK-11947) Support MapState value schema evolution for RocksDB

2019-05-30 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-11947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851810#comment-16851810 ] Cliff Resnick commented on FLINK-11947: --- [~klion26] great! I'll try it today. > Support MapSt

[jira] [Commented] (FLINK-11947) Support MapState value schema evolution for RocksDB

2019-05-29 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-11947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850743#comment-16850743 ] Cliff Resnick commented on FLINK-11947: --- Thanks [~klion26]. I look forward to testing your fix

[jira] [Commented] (FLINK-11947) Support MapState key / value schema evolution for RocksDB

2019-05-24 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-11947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847669#comment-16847669 ] Cliff Resnick commented on FLINK-11947: --- Does the deescalation from Blocker mean

[jira] [Comment Edited] (FLINK-12334) change to MockStreamTask breaks OneInputStreamOperatorTestHarness

2019-05-02 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-12334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831932#comment-16831932 ] Cliff Resnick edited comment on FLINK-12334 at 5/2/19 8:28 PM: --- Hi

[jira] [Comment Edited] (FLINK-12334) change to MockStreamTask breaks OneInputStreamOperatorTestHarness

2019-05-02 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-12334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831932#comment-16831932 ] Cliff Resnick edited comment on FLINK-12334 at 5/2/19 8:27 PM: --- Hi

[jira] [Comment Edited] (FLINK-12334) change to MockStreamTask breaks OneInputStreamOperatorTestHarness

2019-05-02 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-12334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831932#comment-16831932 ] Cliff Resnick edited comment on FLINK-12334 at 5/2/19 8:27 PM: --- Hi

[jira] [Comment Edited] (FLINK-12334) change to MockStreamTask breaks OneInputStreamOperatorTestHarness

2019-05-02 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-12334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831932#comment-16831932 ] Cliff Resnick edited comment on FLINK-12334 at 5/2/19 8:26 PM: --- Hi

[jira] [Commented] (FLINK-12334) change to MockStreamTask breaks OneInputStreamOperatorTestHarness

2019-05-02 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-12334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831932#comment-16831932 ] Cliff Resnick commented on FLINK-12334: --- Hi [~fan_li_ya] We have a custom StreamOperator

[jira] [Created] (FLINK-12334) change to MockStreamTask breaks OneInputStreamOperatorTestHarness

2019-04-25 Thread Cliff Resnick (JIRA)
Cliff Resnick created FLINK-12334: - Summary: change to MockStreamTask breaks OneInputStreamOperatorTestHarness Key: FLINK-12334 URL: https://issues.apache.org/jira/browse/FLINK-12334 Project: Flink

[jira] [Created] (FLINK-12334) change to MockStreamTask breaks OneInputStreamOperatorTestHarness

2019-04-25 Thread Cliff Resnick (JIRA)
Cliff Resnick created FLINK-12334: - Summary: change to MockStreamTask breaks OneInputStreamOperatorTestHarness Key: FLINK-12334 URL: https://issues.apache.org/jira/browse/FLINK-12334 Project: Flink

Re: State Migration with RocksDB MapState

2019-04-25 Thread Cliff Resnick
till bump into the same issue. > > Best, > Gordon > > On Wed, Apr 24, 2019 at 7:45 PM Cliff Resnick wrote: > >> Hi Gordon, >> >> I noticed there has been no movement on this issue and I'm wondering if I >> can find some way to work around this. >> M

Re: State Migration with RocksDB MapState

2019-04-24 Thread Cliff Resnick
s definitely possible. > > Cheers, > Gordon > > [1] https://issues.apache.org/jira/browse/FLINK-11947 > > On Mon, Mar 18, 2019 at 11:20 AM Cliff Resnick wrote: > >> After trying out state migration in 1.8 rc2 I ran into this hard stop >> below. The comment do

[jira] [Commented] (FLINK-11947) Support MapState key / value schema evolution for RocksDB

2019-03-18 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-11947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795072#comment-16795072 ] Cliff Resnick commented on FLINK-11947: --- Thanks for the explanation, that makes sense. However

State Migration with RocksDB MapState

2019-03-17 Thread Cliff Resnick
After trying out state migration in 1.8 rc2 I ran into this hard stop below. The comment does not give an indication why rocksdb map state cannot be migrated, and I'm wondering what the status is, since we do need this functionality and would like to know if this is a long-term blocker or not.

how to use Hadoop Inputformats with flink shaded s3?

2019-01-31 Thread Cliff Resnick
I need to process some Parquet data from S3 as a unioned input in my DataStream pipeline. From what I know, this requires using the hadoop AvroParquetInputFormat. The problem I'm running into is that also requires using un-shaded hadoop classes that conflict with the Flink shaded hadoop3

Re: Problem building 1.7.1 with scala-2.12

2019-01-03 Thread Cliff Resnick
scala-2.11 profile) > > On 02.01.2019 17:48, Cliff Resnick wrote: > > The build fails at flink-connector-kafka-0.9 because _2.12 libraries > > apparently do not exist for kafka < 0.10. Any help appreciated! > > > > > > -Cliff > > >

Problem building 1.7.1 with scala-2.12

2019-01-02 Thread Cliff Resnick
The build fails at flink-connector-kafka-0.9 because _2.12 libraries apparently do not exist for kafka < 0.10. Any help appreciated! -Cliff

using Kudu binary column in Impala

2018-12-15 Thread Cliff Resnick
We're doing some testing storing Hyperloglog synopsis in Kudu. It works well in spark, but the hope is to also query through Impala with a UDF. Spark would remain as the writer, with Impala read-only. To work with Impala I'm wondering if it's best to define the HLL data as Kudu string type with

using Kudu binary column in Impala

2018-12-15 Thread Cliff Resnick
We're doing some testing storing Hyperloglog synopsis in Kudu. It works well in spark, but the hope is to also query through Impala with a UDF. Spark would remain as the writer, with Impala read-only. To work with Impala I'm wondering if it's best to define the HLL data as Kudu string type with

Re: Task Manager allocation issue when upgrading 1.6.0 to 1.6.2

2018-11-12 Thread Cliff Resnick
icationId ` you should see the problem why the TMs > don't start up. > > Cheers, > Till > > On Fri, Nov 9, 2018 at 8:32 PM Cliff Resnick wrote: > >> Hi Till, >> >> Here are Job Manager logs, same job in both 1.6.0 and 1.6.2 at DEBUG >> level. I saw

Re: Flink Web UI does not show specific exception messages when job submission fails

2018-11-09 Thread Cliff Resnick
+1! On Fri, Nov 9, 2018 at 1:34 PM Gary Yao wrote: > Hi, > > We only propagate the exception message but not the complete stacktrace > [1]. > Can you create a ticket for that? > > Best, > Gary > > [1] >

Task Manager allocation issue when upgrading 1.6.0 to 1.6.2

2018-11-08 Thread Cliff Resnick
I'm running a YARN cluster of 8 * 4 core instances = 32 cores, with a configuration of 3 slots per TM. The cluster is dedicated to a single job that runs at full capacity in "FLIP6" mode. So in this cluster, the parallelism is 21 (7 TMs * 3, one container dedicated for Job Manager). When I run

[jira] [Commented] (FLINK-10671) rest monitoring api Savepoint status call fails if akka.ask.timeout < checkpoint duration

2018-11-05 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675184#comment-16675184 ] Cliff Resnick commented on FLINK-10671: --- [~gjy] yes, I think when I created the issue I couldn't

[jira] [Created] (FLINK-10671) rest monitoring api Savepoint status call fails if akka.ask.timeout < checkpoint duration

2018-10-24 Thread Cliff Resnick (JIRA)
Cliff Resnick created FLINK-10671: - Summary: rest monitoring api Savepoint status call fails if akka.ask.timeout < checkpoint duration Key: FLINK-10671 URL: https://issues.apache.org/jira/browse/FLINK-10

[jira] [Created] (FLINK-10671) rest monitoring api Savepoint status call fails if akka.ask.timeout < checkpoint duration

2018-10-24 Thread Cliff Resnick (JIRA)
Cliff Resnick created FLINK-10671: - Summary: rest monitoring api Savepoint status call fails if akka.ask.timeout < checkpoint duration Key: FLINK-10671 URL: https://issues.apache.org/jira/browse/FLINK-10

Re: classloading strangeness with Avro in Flink

2018-08-21 Thread Cliff Resnick
, Aug 21, 2018 at 8:02 AM Cliff Resnick wrote: > Hi Aljoscha, > > We need flink-shaded-hadoop2-uber.jar because there is no hadoop distro on > the instance the Flink session/jobs is managed from and the process that > launches Flink is not a java process, but execs a process that ca

Re: classloading strangeness with Avro in Flink

2018-08-21 Thread Cliff Resnick
5, vino yang wrote: > > Hi Cliff, > > If so, you can explicitly exclude Avro's dependencies from related > dependencies (using ) and then directly introduce dependencies on > the Avro version you need. > > Thanks, vino. > > Cliff Resnick 于2018年8月21日周二 上午5:13写道

Re: classloading strangeness with Avro in Flink

2018-08-20 Thread Cliff Resnick
the Avro dependency to the user jar. However, since I'm using YARN, I'm required to have flink-shaded-hadoop2-uber.jar loaded from lib -- and that has avro bundled un-shaded. So I'm back to the start problem... Any advice is welcome! -Cliff On Mon, Aug 20, 2018 at 1:42 PM Cliff Resnick wrote: >

Re: classloading strangeness with Avro in Flink

2018-08-20 Thread Cliff Resnick
eratorStateBackend.deserializeStateValues(DefaultOperatorStateBackend.java:584) >> at >> org.apache.flink.runtime.state.DefaultOperatorStateBackend.restore(DefaultOperatorStateBackend.java:399) >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.createOperatorStateBackend(

Re: classloading strangeness with Avro in Flink

2018-08-20 Thread Cliff Resnick
, 2018 at 8:43 AM Cliff Resnick wrote: > Hi Vino, > > Thanks for the explanation, but the job only ever uses the Avro (1.8.2) > pulled in by flink-formats/avro, so it's not a class version conflict > there. > > I'm using default child-first loading. It might be a further tr

Re: classloading strangeness with Avro in Flink

2018-08-20 Thread Cliff Resnick
documentation.[2] > > [1]: > https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/config.html > [2]: > https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/debugging_classloading.html > > Thanks, vino. > > Cliff Resnick 于2018年8月20日周一 上午10:40写道: > >&

classloading strangeness with Avro in Flink

2018-08-19 Thread Cliff Resnick
Our Flink/YARN pipeline has been reading Avro from Kafka for a while now. We just introduced a source of Avro OCF (Object Container Files) read from S3. The Kafka Avro continued to decode without incident, but the OCF files failed 100% with anomalous parse errors in the decoding phase after the

Re: MIs-reported metrics using SideOutput stream + Broadcast

2018-07-03 Thread Cliff Resnick
stMetadataJoin.broadcast(metadata) for the* late-join*. > Is this intended, or just a copy error? > > > On 03.07.2018 04:16, Cliff Resnick wrote: > > Our topology has a metadata source that we push via Broadcast. Because > this metadata source is critical, but sometimes late, we a

MIs-reported metrics using SideOutput stream + Broadcast

2018-07-02 Thread Cliff Resnick
Our topology has a metadata source that we push via Broadcast. Because this metadata source is critical, but sometimes late, we added a buffering mechanism via a SideOutput. We call the initial look-up from Broadcast "join" and the secondary, state-backed buffered lookup, "late-join" Today I

[jira] [Created] (FLINK-9339) Accumulators are not UI accessible running in FLIP-6 mode

2018-05-11 Thread Cliff Resnick (JIRA)
Cliff Resnick created FLINK-9339: Summary: Accumulators are not UI accessible running in FLIP-6 mode Key: FLINK-9339 URL: https://issues.apache.org/jira/browse/FLINK-9339 Project: Flink

[jira] [Created] (FLINK-9339) Accumulators are not UI accessible running in FLIP-6 mode

2018-05-11 Thread Cliff Resnick (JIRA)
Cliff Resnick created FLINK-9339: Summary: Accumulators are not UI accessible running in FLIP-6 mode Key: FLINK-9339 URL: https://issues.apache.org/jira/browse/FLINK-9339 Project: Flink

Re: "broadcast" tablet replication for kudu?

2018-03-16 Thread Cliff Resnick
dcast tables better? Perhaps Impala can cache >>> small tables locally when doing joins. >>> >>> - Dan >>> >>> On Fri, Mar 16, 2018 at 10:55 AM, Clifford Resnick < >>> cresn...@mediamath.com> wrote: >>> >>>> The problem is,

[jira] [Updated] (FLINK-8616) Missing null check in OperatorChain.CopyingChainingOutput#pushToOperator masks ClassCastException

2018-02-08 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cliff Resnick updated FLINK-8616: - Summary: Missing null check in OperatorChain.CopyingChainingOutput#pushToOperator masks

[jira] [Updated] (FLINK-8616) Missing null check in OperatorChain.copyingChainOutput#pushToOperator masks ClassCastException

2018-02-08 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cliff Resnick updated FLINK-8616: - Summary: Missing null check in OperatorChain.copyingChainOutput#pushToOperator masks

[jira] [Created] (FLINK-8616) Missing null check in OperatorChain#pushToOperator masks ClassCastException

2018-02-08 Thread Cliff Resnick (JIRA)
Cliff Resnick created FLINK-8616: Summary: Missing null check in OperatorChain#pushToOperator masks ClassCastException Key: FLINK-8616 URL: https://issues.apache.org/jira/browse/FLINK-8616 Project

[jira] [Created] (FLINK-8616) Missing null check in OperatorChain#pushToOperator masks ClassCastException

2018-02-08 Thread Cliff Resnick (JIRA)
Cliff Resnick created FLINK-8616: Summary: Missing null check in OperatorChain#pushToOperator masks ClassCastException Key: FLINK-8616 URL: https://issues.apache.org/jira/browse/FLINK-8616 Project

Re: Task Manager detached under load

2018-01-30 Thread Cliff Resnick
I've seen a similar issue while running successive Flink SQL batches on 1.4. In my case, the Job Manager would fail with the log output about unreachability (with an additional statement about something going "horribly wrong"). Under workload pressure, I reverted to 1.3.2 where everything works

CSV writer/parser inconsistency when using the Table API?

2017-12-22 Thread Cliff Resnick
I've been trying out the Table API for some ETL using a two-stage job of CsvTableSink (DataSet) -> CsvInputFormat (Stream). I ran into an issue where the first stage produces output with trailing null values (valid), which causes a parse error in the second stage. Looking at

Re: Question about Drill aggregate queries and schema change

2017-07-25 Thread Cliff Resnick
atch, or a scan batch with injected nullable-int columns, we will > return NONE to the downstream operators directly, which will avoid the > unintended consequence. > > I will probably wrap up that work in a few days, and submit a PR for > review. > > > > On Mon, Jul 24, 2017 a

Re: Question about Drill aggregate queries and schema change

2017-07-24 Thread Cliff Resnick
batch. Nullable int will be injected in downstream operator. > > 1. > https://github.com/apache/drill/blob/master/contrib/ > storage-kudu/src/main/java/org/apache/drill/exec/store/ > kudu/KuduRecordReader.java#L149-L163 > > > On Mon, Jul 24, 2017 at 1:35 PM, Cliff Re

Re: Question about Drill aggregate queries and schema change

2017-07-24 Thread Cliff Resnick
at 2:46 PM, Cliff Resnick <cre...@gmail.com> wrote: > Jinfeng, > > Thanks, that confirms my thoughts as well. If I query using full range > bounds and all hash keys, then Kudu prunes to the exact tablets and there > is no error. I'll watch that jira expectantly becau

Re: Question about Drill aggregate queries and schema change

2017-07-24 Thread Cliff Resnick
rowse/DRILL-5546 > > > > On Mon, Jul 24, 2017 at 10:56 AM, Cliff Resnick <cre...@gmail.com> wrote: > > > I spent some time over the weekend altering Drill's storage-kudu to use > > Kudu's predicate pushdown api. Everything worked great as long as I > > perform

Question about Drill aggregate queries and schema change

2017-07-24 Thread Cliff Resnick
I spent some time over the weekend altering Drill's storage-kudu to use Kudu's predicate pushdown api. Everything worked great as long as I performed flat filtered selects (eg. SELECT .. FROM .. WHERE ..") but whenever I tested aggregate queries, they would succeed sometimes, then fail other times

Re: Question about Kudu Storage and storages in general

2017-07-21 Thread Cliff Resnick
Hi Gautam, Though I did get the filter pushdown to kudu working I unfortunately encountered sporadic Drill errors when performing aggregate queries. The most common error was: java.lang.IllegalStateException: Failure while reading vector. Expected vector class of

[jira] [Commented] (FLINK-6964) Fix recovery for incremental checkpoints in StandaloneCompletedCheckpointStore

2017-06-26 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064079#comment-16064079 ] Cliff Resnick commented on FLINK-6964: -- [~srichter] Looks good from this end, all tests passed

[jira] [Commented] (FLINK-6964) Fix recovery for incremental checkpoints in StandaloneCompletedCheckpointStore

2017-06-26 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063768#comment-16063768 ] Cliff Resnick commented on FLINK-6964: -- [~srichter] So far, looks good! I need to leave early today

[jira] [Commented] (FLINK-6964) Fix recovery for incremental checkpoints in StandaloneCompletedCheckpointStore

2017-06-23 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061270#comment-16061270 ] Cliff Resnick commented on FLINK-6964: -- looks likes it's still trying to register a Placeholder

[jira] [Commented] (FLINK-6964) Fix recovery for incremental checkpoints in StandaloneCompletedCheckpointStore

2017-06-23 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061257#comment-16061257 ] Cliff Resnick commented on FLINK-6964: -- I ran with your newer precondition. It actually succeeded

[jira] [Commented] (FLINK-6964) Fix recovery for incremental checkpoints in StandaloneCompletedCheckpointStore

2017-06-23 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061216#comment-16061216 ] Cliff Resnick commented on FLINK-6964: -- ok, will try that. meanwhile here is a run (and hang) from

[jira] [Commented] (FLINK-6964) Fix recovery for incremental checkpoints in StandaloneCompletedCheckpointStore

2017-06-23 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061178#comment-16061178 ] Cliff Resnick commented on FLINK-6964: -- ok I'll wait on your push > Fix recovery for incremen

[jira] [Commented] (FLINK-6964) Fix recovery for incremental checkpoints in StandaloneCompletedCheckpointStore

2017-06-23 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061163#comment-16061163 ] Cliff Resnick commented on FLINK-6964: -- Ha! I just started running. ok, will merge and rebuild

[jira] [Commented] (FLINK-6964) Fix recovery for incremental checkpoints in StandaloneCompletedCheckpointStore

2017-06-23 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061092#comment-16061092 ] Cliff Resnick commented on FLINK-6964: -- I'll merge and rerun. This is what I have for log scope

[jira] [Commented] (FLINK-6964) Fix recovery for incremental checkpoints in StandaloneCompletedCheckpointStore

2017-06-23 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061088#comment-16061088 ] Cliff Resnick commented on FLINK-6964: -- [~srichter] By hanging I mean that the checkpoint, though

[jira] [Commented] (FLINK-6964) Fix recovery for incremental checkpoints in StandaloneCompletedCheckpointStore

2017-06-22 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060027#comment-16060027 ] Cliff Resnick commented on FLINK-6964: -- [~srichter] I tried your fix. After resuming from

[jira] [Commented] (FLINK-6633) Register with shared state registry before adding to CompletedCheckpointStore

2017-06-21 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058158#comment-16058158 ] Cliff Resnick commented on FLINK-6633: -- Thanks [~srichter], I'll give this a try tomorrow morning EST

[jira] [Commented] (FLINK-6633) Register with shared state registry before adding to CompletedCheckpointStore

2017-06-20 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056537#comment-16056537 ] Cliff Resnick commented on FLINK-6633: -- Stefan, if you need me to unpack things further please feel

[jira] [Commented] (FLINK-6633) Register with shared state registry before adding to CompletedCheckpointStore

2017-06-20 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056527#comment-16056527 ] Cliff Resnick commented on FLINK-6633: -- Ok, new gist is here: https://gist.github.com/cresny

[jira] [Commented] (FLINK-6633) Register with shared state registry before adding to CompletedCheckpointStore

2017-06-20 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056342#comment-16056342 ] Cliff Resnick commented on FLINK-6633: -- full log here: https://gist.github.com/cresny

[jira] [Commented] (FLINK-6633) Register with shared state registry before adding to CompletedCheckpointStore

2017-06-20 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056334#comment-16056334 ] Cliff Resnick commented on FLINK-6633: -- {noformat} 2017-06-20 18:44:39.376 [ip-10-150-96-228] INFO

[jira] [Commented] (FLINK-6633) Register with shared state registry before adding to CompletedCheckpointStore

2017-06-20 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056332#comment-16056332 ] Cliff Resnick commented on FLINK-6633: -- The issue that [~gyfora] mentioned still exists in current

Question on checkpoint management

2017-05-08 Thread Cliff Resnick
When a job cancel-with-savepoint finishes a successful Savepoint, the preceding last successful Checkpoint is removed. Is this the intended behavior? I thought that checkpoints and savepoints were separate entities and, as such, savepoints should not infringe on checkpoints. This is actually an

[jira] [Created] (FLINK-5646) REST api documentation missing details on jar upload

2017-01-25 Thread Cliff Resnick (JIRA)
Cliff Resnick created FLINK-5646: Summary: REST api documentation missing details on jar upload Key: FLINK-5646 URL: https://issues.apache.org/jira/browse/FLINK-5646 Project: Flink Issue

Re: REST api: how to upload jar?

2017-01-25 Thread Cliff Resnick
; Can you also please open a jira for fixing the documentation? > > - Sachin > > > On Jan 25, 2017 06:55, "Cliff Resnick" <cre...@gmail.com> wrote: > >> The 1.2 release documentation (https://ci.apache.org/project >> s/flink/flink-docs-release-1.2/monitoring

[jira] [Created] (FLINK-5646) REST api documentation missing details on jar upload

2017-01-25 Thread Cliff Resnick (JIRA)
Cliff Resnick created FLINK-5646: Summary: REST api documentation missing details on jar upload Key: FLINK-5646 URL: https://issues.apache.org/jira/browse/FLINK-5646 Project: Flink Issue

REST api: how to upload jar?

2017-01-24 Thread Cliff Resnick
The 1.2 release documentation (https://ci.apache.org/ projects/flink/flink-docs-release-1.2/monitoring/rest_api.html) states "It is possible to upload, run, and list Flink programs via the REST APIs and web frontend". However there is no documentation about uploading a jar via REST api. Does this

Re: Streaming pipeline failing to complete checkpoints at scale

2016-12-23 Thread Cliff Resnick
for a bit): > > - There is some much enhanced Checkpoint Monitoring almost ready to be > merged. That should help in fining out where the barriers get delayed. > > - Finally, we are experimenting with some other checkpoint alignment > variants (alternatives to the BarrierBuffer). We can p

[jira] [Commented] (FLINK-4228) RocksDB semi-async snapshot to S3AFileSystem fails

2016-12-23 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15773084#comment-15773084 ] Cliff Resnick commented on FLINK-4228: -- My last pull request is good to go so I guess it's up to you

Streaming pipeline failing to complete checkpoints at scale

2016-12-23 Thread Cliff Resnick
We are running a DataStream pipeline using Exactly Once/Event Time semantics on 1.2-SNAPSHOT. The pipeline sources from S3 using the ContinuousFileReaderOperator. We use a custom version of the ContinuousFileMonitoringFunction since our source directory changes over time. The pipeline transforms

[jira] [Commented] (FLINK-4228) RocksDB semi-async snapshot to S3AFileSystem fails

2016-12-23 Thread Cliff Resnick (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15772906#comment-15772906 ] Cliff Resnick commented on FLINK-4228: -- The issue now is exclusive to running on YARN with s3a

Re: Resource under-utilization when using RocksDb state backend [SOLVED]

2016-12-08 Thread Cliff Resnick
or how long did you look at iotop. It could be that the IO access happens > in bursts, depending on how data is cached. > > I'll also add Stefan Richter to the conversation, he has maybe some more > ideas what we can do here. > > > On Mon, Dec 5, 2016 at 6:19 PM, Cliff Res

Re: Resource under-utilization when using RocksDb state backend

2016-12-05 Thread Cliff Resnick
ou could look > into tuning the RocksDB settings so that it uses more memory for caching. > > Regards, > Robert > > > On Fri, Dec 2, 2016 at 11:34 PM, Cliff Resnick <cre...@gmail.com> wrote: > >> In tests comparing RocksDb to fs state backend we observe mu

Resource under-utilization when using RocksDb state backend

2016-12-02 Thread Cliff Resnick
In tests comparing RocksDb to fs state backend we observe much lower throughput, around 10x slower. While the lowered throughput is expected, what's perplexing is that machine load is also very low with RocksDb, typically falling to < 25% CPU and negligible IO wait (around 0.1%). Our test

  1   2   >