Quick Question regarding the same topic. If construct the data frame *df*
in the following way
key | value | topic| offset | partition | timestamp
foo1 | value1 | topic1 | ...
foo2 | value2 | topic2 | ...
what
Here is a simple example that reproduces the problem. This code has a
missing attribute('kk') error. Is it a bug? Note that if the `select`
in line B is removed, this code would run.
import pyspark.sql.functions as F
df =
Can you file a jira if this is a bug?
Thanks!
On Sat, Mar 24, 2018 at 1:23 AM, Michael Shtelma wrote:
> Hi Maropu,
>
> the problem seems to be in FilterEstimation.scala on lines 50 and 52:
> https://github.com/apache/spark/blob/master/sql/
>
I am still stuck with this. Anyone knows the correct way to use the custom
Aggregator for the case class in agg way?
I like to use Dataset API, but it looks like in aggregation, Spark lost the
Type, and back to GenericRowWithSchema, instead of my case class. Is that right?
Thanks
Quick question:
how to add the --jars /path/to/sparklens_2.11-0.1.0.jar to the
spark-default conf, should it be using:
spark.driver.extraClassPath /path/to/sparklens_2.11-0.1.0.jar or i should
use spark.jars option? anyone who could give an example how it should be,
and if i the path for the
Hi Maropu,
the problem seems to be in FilterEstimation.scala on lines 50 and 52:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala?utf8=✓#L50-L52
val filterSelectivity =
Hi:
I am working on spark structured streaming (2.2.1) with kafka and want 100
executors to be alive. I set spark.executor.instances to be 100. The process
starts running with 100 executors but after some time only a few remain which
causes backlog of events from kafka.
I thought I saw a
Hi,
couple of questions about the internals of the persist mechanism (RDD, but
maybe applicable also to DS/DF).
Data is processed stage by stage. So what actually runs in worker nodes is
the calculation of the partitions of the result of a stage, not the single
RDDs. Operation of all the RDDs
Hi all,
I'm trying the new ORC native in Spark 2.3
(org.apache.spark.sql.execution.datasources.orc).
I've compiled Spark 2.3 from the git branch-2.3 as of March 20th.
I also get the same error for the Spark 2.2 from Hortonworks HDP 2.6.4.
*NOTE*: the error only occurs with zlib compression, and
You can us import static to import it directly:
import static org.apache.spark.sql.functions.lit;
Femi
From: 崔苗
Date: Friday, March 23, 2018 at 8:34 AM
To: "user@spark.apache.org"
Subject: how to use lit() in spark-java
Hi Guys,
I want to add a
You have import functions
dataset.withColumn(columnName,functions.lit("constant"))
Thank you
Anil Langote
Sent from my iPhone
_
From: 崔苗
Sent: Friday, March 23, 2018 8:33 AM
Subject: how to use lit() in spark-java
To:
Hi
hi,
What's a query to reproduce this?
It seems when casting double to BigDecimal, it throws the exception.
// maropu
On Fri, Mar 23, 2018 at 6:20 PM, Michael Shtelma wrote:
> Hi all,
>
> I am using Spark 2.3 with activated cost-based optimizer and a couple
> of hive
Hi Guys,
I want to add a constant column to dataset by lit function in java, like that:
dataset.withColumn(columnName,lit("constant"))
but it's seems that idea coundn't found the lit() function,so how to use lit()
function in java?
thanks for any reply
It is in yarn module.
"org.apache.spark.deploy.yarn.security.ServiceCredentialProvider".
2018-03-23 15:10 GMT+08:00 Jorge Machado :
> Hi Jerry,
>
> where do you see that Class on Spark ? I only found
> HadoopDelegationTokenManager
> and I don’t see any way to add my Provider into
Hi everybody,
this is my first message to the mailing list. In a world of DataFrames and
Structured Streaming my use cases may be considered kind of corner cases,
but still I think it's important to address such problems and go deep in
understanding how Spark RDDs work.
We have an application
Hi all,
I am using Spark 2.3 with activated cost-based optimizer and a couple
of hive tables, that were analyzed previously.
I am getting the following exception for different queries:
java.lang.NumberFormatException
at java.math.BigDecimal.(BigDecimal.java:494)
at
Hi,
I have a collection of text documents, I extracted the list of significat
terms from that collection.
I want to calculate co-occurance matrix for the extracted terms by using
spark.
I actually stored the the collection of text document in a DataFrame,
StructType schema = *new*
Hi Jerry,
where do you see that Class on Spark ? I only found
HadoopDelegationTokenManager and I don’t see any way to add my Provider into
it.
private def getDelegationTokenProviders: Map[String,
HadoopDelegationTokenProvider] = {
val providers = List(new
I think you can build your own Accumulo credential provider as similar to
HadoopDelegationTokenProvider out of Spark, Spark already provided an
interface "ServiceCredentialProvider" for user to plug-in customized
credential provider.
Thanks
Jerry
2018-03-23 14:29 GMT+08:00 Jorge Machado
Hi Guys,
I’m on the middle of writing a spark Datasource connector for Apache Spark to
connect to Accumulo Tablets, because we have Kerberos it get’s a little trick
because Spark only handles the Delegation Tokens from Hbase, hive and hdfs.
Would be a PR for a implementation of
Use a streaming query listener that tracks repetitive progress events for the
same batch id. If x amount of time has elapsed given repetitive progress events
for the same batch id, the source is not providing new offsets and stream
execution is not scheduling new micro batches. See also:
21 matches
Mail list logo