Hey,

I think there is no real interest in this feature; we don't have users/contributors backing it - last development was around 2018 October; there were ~2 bugfix commits ever since that...we should stop carrying dead weight...another 2 weeks went by since Stamatis have reminded us that after 1.5 years(!) nothing have changed.

+1 on removing it

cheers,
Zoltan

you may inspect some of the recent changes with:
git log -c `find . -type f -path '**/spark/**'|grep -v xml|grep -v 
properties|grep -v q.out`


On 1/28/22 2:32 PM, Stamatis Zampetakis wrote:
Hi team,

Almost one year has passed since the last exchange in this discussion and
if I am not wrong there has been no effort to revive Hive-on-Spark. To be
more precise, I don't think I have seen any Spark related JIRA for quite
some time now and although I don't want to rush into conclusions, there
does not seem to be any community member involved in maintaining or adding
new features in this part of the code.

Keeping dead code in the repository does not do any good to the project and
puts a non-negligible burden to future maintainers.

Clearly, we cannot make a new Hive release where a major feature is
completely untested so either someone commits to re-enable/fix the
respective tests soon or we move forward the work started by David and drop
support for Hive-on-Spark.

I would like to ask the community if there is anyone who can take up this
maintenance task and enable/fix Spark related tests in the next month or so?

Best,
Stamatis

On Sat, Feb 27, 2021 at 4:17 AM Edward Capriolo <edlinuxg...@gmail.com>
wrote:

I do not know how it works for most of the world. But in cloudera where the
TEZ options were never popular hive-on-spark represents a solid way to get
things done for small datasets lower latency.

As for the spark adoption. You know a while ago I came up with some ways to
make hive more  spark like. One of them was a found a way to make "compile"
a hive keyword so folks could build UDFs on the fly. It was such an
uphil climb. Folks found a way to make it disabled by default for security.
Then later when things moved from CLI to beeline it was like the ONLY thing
that I found not ported. Like it was extremely frustrating.






On Mon, Jul 27, 2020 at 3:19 PM David <dam6...@gmail.com> wrote:

Hello  Xuefu,

I am not part of the Cloudera Hive product team,  though I volunteer to
work on small projects from time to time.  Perhaps someone from that team
can chime in with some of their thoughts, but personally, I think that in
the long run, there will be more of a merge between Hive-on-Spark and
other
Spark-native offerings.  I'm not sure what the differentiation will be
going forward.  With that said, are there any developers on this mailing
list who are willing to take on the maintenance effort of keeping HoS
moving forward?

http://www.russellspitzer.com/2017/05/19/Spark-Sql-Thriftserver/


https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.4/bk_spark-component-guide/content/config-sts.html


Thanks.

On Thu, Jul 23, 2020 at 12:35 PM Xuefu Zhang <xu...@apache.org> wrote:

Previous reasoning seemed to suggest a lack of user adoption. Now we
are
concerned about ongoing maintenance effort. Both are valid
considerations.
However, I think we should have ways to find out the answers.
Therefore,
I
suggest the following be carried out:

1. Send out the proposal (removing Hive on Spark) to users including
u...@hive.apache.org and get their feedback.
2. Ask if any developers on this mailing list are willing to take on
the
maintenance effort.

I'm concerned about user impact because I can still see issues being
reported on HoS from time to time. I'm more concerned about the future
of
Hive if we narrow Hive neutrality on execution engines, which will
possibly
force more Hive users to migrate to other alternatives such as Spark
SQL,
which is already eroding Hive's user base.

Being open and neutral used to be Hive's most admired strengths.

Thanks,
Xuefu


On Wed, Jul 22, 2020 at 8:46 AM Alan Gates <alanfga...@gmail.com>
wrote:

An important point here is I don't believe David is proposing to
remove
Hive on Spark from the 2 or 3 lines, but only from trunk.  Continuing
to
support it in existing 2 and 3 lines makes sense, but since no one
has
maintained it on trunk for some time and it does not work with many
of
the
newer features it should be removed from trunk.

Alan.

On Tue, Jul 21, 2020 at 4:10 PM Chao Sun <sunc...@apache.org> wrote:

Thanks David. FWIW Uber is still running Hive on Spark (2.3.4) on a
very
large scale in production right now and I don't think we have any
plan
to
change it soon.



On Tue, Jul 21, 2020 at 11:28 AM David <dam6...@gmail.com> wrote:

Hello,

Thanks for the feedback.

Just a quick recap: I did propose this @dev and I received
unanimous
+1's
from the community.  After a couple months, I created the PR.

Certainly open to discussion, but there hasn't been any
discussion
thus
far
because there have been no objections until this point.

HoS has low adoption, heavy technical debt, and the manner in
which
its
build process is setup is impeding some other work that is not
even
related
to HoS.

We can deprecate in Hive 3.x and remove in Hive 4.x.  The plan
would
be
to
use Tez moving forward.

My point about the vendor's move to Tez is that HoS adoption is
very
low,
it's only going lower, and while I don't know the specifics of
it,
there
must be some migration plan in place there (i.e., it must be
possible
to
do
it already).

Thanks,
David

On Tue, Jul 21, 2020 at 12:23 PM Xuefu Zhang <xu...@apache.org>
wrote:

Hi David,

While a vendor may not support a component in an open source
project,
removing it or not is a decision by and for the community. I
certainly
understand that the vendor you mentioned has contributed a
great
deal
(including my personal effort while working there), it's not up
to
the
vendor to make a call like what is proposed here.

As a community, we should have gone through a thorough
discussion
and
reached a consensus before actually making such a big change,
in
my
opinion.

Thanks,
Xuefu

On Tue, Jul 21, 2020 at 8:49 AM David <dam6...@gmail.com>
wrote:

Hey,

Thanks for the input.

FYI. Cloudera (Cloudera + Hortonworks) have removed HoS from
their
latest
offering.

"Tez is now the only supported execution engine, existing
queries
that
change execution mode to Spark or MapReduce within a session,
for
example,
fail."









https://docs.cloudera.com/cdp/latest/upgrade-post/topics/ug_hive_configuration_changes.html


So I don't know who will be supporting this feature moving
forward,
but
there has been a lot of work done to make this change as
painless
as
possible.  Simply set the engine to 'tez' and remove the
HoS-related
settings should address many use cases.

Thanks.

On Tue, Jul 21, 2020 at 11:36 AM Xuefu Z <usxu...@gmail.com>
wrote:

Sorry for chiming in late. However, I don't think we should
remove
Hive
on
Spark just because of a technical problem. This is rather a
big
decision
that we need to be careful about. There are users that will
be
left
high
and dry by this move.

If the community decides to desupport and eventually remove
it, I
think
we
need to have a due process. We also need a deprecation plan
if
that's
we
decide to do. Before that, I'm -1 on this proposal.

Thanks,
Xuefu

On Tue, Jul 21, 2020 at 7:57 AM David <dam6...@gmail.com>
wrote:

Hello Team,

https://github.com/apache/hive/pull/1285

Thanks.

On Wed, Jun 3, 2020 at 11:49 PM Gopal V <
gop...@apache.org

wrote:


+1

Cheers,
Gopal

On 6/3/20 7:48 PM, Jesus Camacho Rodriguez wrote:
+1

-Jesús

On Wed, Jun 3, 2020 at 1:58 PM Alan Gates <
alanfga...@gmail.com>
wrote:

+1.

Alan.

On Wed, Jun 3, 2020 at 1:40 PM Prasanth Jayachandran
<pjayachand...@cloudera.com.invalid> wrote:

+1

On Jun 3, 2020, at 1:38 PM, Ashutosh Chauhan <
hashut...@apache.org>
wrote:

+1

On Wed, Jun 3, 2020 at 1:23 PM David Mollitor <
dam6...@gmail.com>
wrote:

Hello Gang,

I have spent some time working on upgrading Avro
(far
less
than
others):

https://issues.apache.org/jira/browse/HIVE-21737

This should be a relatively easy thing to do, but
is
blocked
by
Hive-on-Spark.  HoS has a weird thing where it
downloads
some
cloud-storage-hosted file of Spark-Hadoop as part
of
its
maven
run.

Since HoS is not going to receive updates from
the
major
vendors,
is
it
time to simply remove it?

Tests are currently disabled:
https://issues.apache.org/jira/browse/HIVE-23137

Thanks.









--
Xuefu Zhang

"In Honey We Trust!"










Reply via email to