I think it’s very simple conceptually (well the way I’ve been thinking
about it), and any limitations can be derived from this using logic. If
someone has specific questions we can clarify; if some limitations are
missing (e.g. when Hive was using stats for hbase tables) it is a bug.

For managed tables, Hive owns and manages the data:

* Data, properties, etc. can only be changed via Hive commands or APIs;
Hive assumes it’s true and any other modification on the underlying
storage is undefined behavior (not just results - basically all bets are
off ;)).
* Data is attached to the Hive entity (partition, table). Whatever happens
to the entity happens to the data (e.g. the behavior mentioned in the
wiki, where  dropping the table/partition deletes the data).


For external tables (also SH tables before Hive made those explicitly
external), Hive doesn’t manage or own the data:

* Hive assumes the data can be modified externally at any time. Any
feature that requires assumptions about data being unchanged (queries from
stats, ACID, some MV, query cache) won’t work. If it works, it’s a bug.
* Data is detached from Hive entities, e.g. dropping the table by default
doesn’t delete the data - we assume someone else owns it and we are just
pointing at it.
* Additionally, since Hive cannot track modifications, external
products/users/etc. are fully responsible for any concurrency control. In
particular, modifying data under the running query may produce
inconsistent results (no atomicity between tasks), undefined results due
to retries, etc.





On 18/8/21, 10:31, "Mithun RK" <mythro...@gmail.com> wrote:

>+1. I'm out of the loop as well. I'd be keen on grokking what's not
>available with EXTERNAL tables, given that these are popular on our
>clusters as well. It would be good to have that documented in one place.
>
>On Tue, Aug 21, 2018 at 8:19 AM Lars Francke <lars.fran...@gmail.com>
>wrote:
>
>> As this came up with yet another customer and I've heard more things
>>that I
>> believe to be false:
>>
>> Please, anyone have any kind of feedback for me here? Is my list
>>correct,
>> can you add stuff to it, have more details to certain points etc.?
>>
>> Thank you!
>>
>> On Wed, Aug 1, 2018 at 12:17 PM, Lars Francke <lars.fran...@gmail.com>
>> wrote:
>>
>> > I have opened https://issues.apache.org/jira/browse/HIVE-20287 and
>>would
>> > love to get feedback
>> >
>> > On Wed, Aug 1, 2018 at 11:14 AM, Lars Francke <lars.fran...@gmail.com>
>> > wrote:
>> >
>> >> Thanks Lefty! I haven't created a JIRA but it's on my list.
>> >>
>> >> Still interested in any input anyone might have?
>> >>
>> >> Some of you guys _must_ have some knowledge on this stuff.
>> >>
>> >> You're correct that the DROP stuff is documented. What I meant is
>>that I
>> >> couldn't find the relevant parts in the code.
>> >>
>> >> On Sat, Jul 21, 2018 at 6:15 AM, Lefty Leverenz <
>> leftylever...@gmail.com>
>> >> wrote:
>> >>
>> >>> Agreed, the Hive wiki is woefully incomplete.  But it does mention
>>DROP
>> >>> for
>> >>> external tables in two sections of the DDL doc:
>> >>>
>> >>>    - External Tables
>> >>>    <https://cwiki.apache.org/confluence/display/Hive/LanguageM
>> >>> anual+DDL#LanguageManualDDL-ExternalTables>:
>> >>>     "When dropping an EXTERNAL table, data in the table is NOT
>>deleted
>> >>> from
>> >>>    the file system."
>> >>>    - Drop Table
>> >>>    <https://cwiki.apache.org/confluence/display/Hive/LanguageM
>> >>> anual+DDL#LanguageManualDDL-DropTable>:
>> >>>     "When dropping an EXTERNAL table, data in the table will NOT be
>> >>> deleted
>> >>>    from the file system."
>> >>>
>> >>> A wiki page comparing managed and external tables would be very
>> helpful,
>> >>> and the list is a good start.  I suggest you open a JIRA issue.
>> >>>
>> >>> Who can help Lars with this task?
>> >>>
>> >>> -- Lefty
>> >>>
>> >>>
>> >>> On Thu, Jul 19, 2018 at 3:04 PM Lars Francke
>><lars.fran...@gmail.com>
>> >>> wrote:
>> >>>
>> >>> > Hi,
>> >>> >
>> >>> > I've been wondering if anyone can tell me what the differences and
>> >>> > limitations of managed vs. external tables are these days.
>> >>> >
>> >>> > I've seen the docs from Hortonworks[1] that list these features as
>> not
>> >>> > supported by External tables:
>> >>> > * Query cache
>> >>> > * Materialized views, except in a limited way
>> >>> > * Default statistics gathering
>> >>> > * Compute queries using statistics
>> >>> > * Automatic runtime filtering
>> >>> > * File merging after insert
>> >>> >
>> >>> > But there are no details. For some of them I'm not even sure what
>> they
>> >>> are
>> >>> > supposed to mean because I couldn't find any documentation in our
>> Wiki
>> >>> :(
>> >>> >
>> >>> > So I'd love to create a Wiki page that describes the differences
>> >>> between
>> >>> > managed & external tables and list relevant JIRAs but I need your
>> help
>> >>> for
>> >>> > that.
>> >>> >
>> >>> > I scanned the code for all references to TableType and this is the
>> >>> list I
>> >>> > found but I'm almost certainly missing something:
>> >>> >
>> >>> > * ARCHIVE/UNARCHIVE - DDLTask - Only works for managed tables
>> >>> > * TRUNCATE - DDLSemanticAnalyzer - Only works for managed tables
>> >>> > * MERGE/CONCATENATE - HiveRelOpMaterializationValidator - Only
>>works
>> >>> for
>> >>> > managed tables
>> >>> > * Constraints - DDLSemanticAnalyzer -  (NOT NULL, DEFAULT, CHECK,
>> only
>> >>> RELY
>> >>> > ist allowed)
>> >>> > * IMPORT - ImportSemanticAnalyzer - This has some wild
>>restrictions I
>> >>> > didn't follow for external tables
>> >>> > * Query Results Caching - https://issues.apache.org/jira
>> >>> /browse/HIVE-18513
>> >>> > SemanticAnalyzer - Documentation missing for Results Cache
>> >>> >
>> >>> > So there's a bunch of those from the Hortonworks list missing. If
>> >>> anyone
>> >>> > could point me to documentation or code for those that'd be great.
>> >>> >
>> >>> > I also didn't find the most obvious/oldest one: Data is not
>>deleted
>> for
>> >>> > EXTERNAL tables on DROP, I'm sure I just missed it.
>> >>> >
>> >>> > And last but not least: I have not looked at the grammar to see
>> what's
>> >>> > already forbidden at that level.
>> >>> >
>> >>> > As I said: Any hints would be greatly appreciated.
>> >>> >
>> >>> > Thank you!
>> >>> >
>> >>> > Lars
>> >>> >
>> >>> > A side note: There are so many fantastic features in Hive but
>>because
>> >>> lots
>> >>> > of them are not or under-documented most people I know just stick
>>to
>> >>> the
>> >>> > basics. That's a pity :(
>> >>> >
>> >>> > [1] <
>> >>> >
>> >>> > https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/usi
>> >>> ng-hiveql/content/hive_hive_3_tables.html
>> >>> > >
>> >>> >
>> >>>
>> >>
>> >>
>> >
>>

Reply via email to