Hi Jacques,


> On 21 May 2019, at 22:11, Jacques Nadeau <jacq...@dremio.com> wrote:
> 
> It’s not at all clear why unique keys would be needed at all. 
> 
> If we turn your questions around, you answer yourself. If you have 
> independent writers, you need unique keys.
> 
> Also truly independent writers (like a job writing while a job compacts), 
> means effectively a distributed transaction, and I believe it’s clearly out 
> of scope for Iceberg to solve that ?
> 
> Assuming a single process is writing seems severely limiting in design and 
> scale. I'm also surprised that you would think this is outside of Iceberg's 
> scope. A table format that can only be modified by a single process basically 
> locks that format into a single tool for a particular deployment.


That's my point, truly independent writers (two Spark jobs, or a Spark job and 
Dremio job) means a distributed transaction. It would need yet another external 
transaction coordinator on top of both Spark and Dremio, Iceberg by itself 
cannot solve this.

By single writer, I don't mean single process, I mean multiple coordinated 
processes like Spark executors coordinated by Spark driver. The coordinator 
ensures that the data is pre-partitioned on 
each executor, and the coordinator commits the snapshot. 

Note however that single writer job/multiple concurrent reader jobs is 
perfectly feasible, i.e. it shouldn't be a problem to write from a Spark job 
and read from multiple Dremio queries concurrently (for example)



> 
> Uniqueness - enforcing uniqueness at scale is not feasible (proovably so).
> 
> Expecting uniqueness is different than enforcing it. If you're saying it is 
> impossible to enforce, I understand that. If your we can't define a system 
> where it is expected and there are ramifications if it is not maintained.


I'm not sure what you mean exactly. If we can't enforce uniqueness we shouldn't 
assume it. We do expect that most of the time the natural key is unique, but 
the eager and lazy with natural key designs can handle duplicates 
consistently. Basically it's not a problem to have duplicate natural keys, 
everything works fine.



> 
> Also, at scale, it’s really only feasible to do query and update/upsert on
> the partition/bucket/sort key, any other access is likely a full scan of 
> terabytes of data, on remote storage.
> 
> I'm not sure why you would say unless you assume a particular implementation. 
> Single record deletion is definitely an important use case. There is no need 
> to do a full table scan to accomplish that unless you're assuming an eager 
> approach to deletion.

Let me try and clarify each point:

- lookup for query or update on a non-(partition/bucket/sort) key predicate 
implies scanning large amounts of data - because these are the only data 
structures that can narrow down the lookup, right ? One could argue that the 
min/max index (file skipping) can be applied to any column, but in reality if 
that column is not sorted the min/max intervals can have huge overlaps so it 
may be next to useless.
- remote storage - this is a critical architecture decision - implementations 
on local storage imply a vastly different design for the entire system, storage 
and compute. 
- deleting single records per snapshot is unfeasible in eager but also 
particularly in the lazy design: each deletion creates a very small snapshot. 
Deleting 1 million records one at a time would create 1 million small files, 
and 1 million RPC calls.



> 
> I do continue to wonder how much of this back and forth is the mixing of 
> thinking around restatement (eager) versus delta (lazy) implementations. 
> Maybe we should separate them out as two different conversations?
> 


Eager is conceptually just lazy + compaction done, well, eagerly. The logic for 
both is exactly the same, the trade-off is just that with eager you implicitly 
compact every time so that you don't do any work on read, while with lazy 
you want to amortize the cost of compaction over multiple snapshots.

Basically there should be no difference between the two conceptually, or with 
regard to keys, etc. The only difference is some mechanics in implementation.



Reply via email to