This new design makes sense to me. So we just add 2 more bytes to store
nanosOfMicro, and the rest is the same as the current timestamp types: same
value range, but higher precision.

On Thu, May 7, 2026 at 5:16 PM Max Gekk <[email protected]> wrote:

> Hi Spark devs,
>
> I’d like to share a proposal for nano-second-capable timestamp support
> and ask for your feedback.
>
> Here is the SPIP:
>
> https://docs.google.com/document/d/1DeW15QueI4PdRyPm6C6jsTZFmIjbXX2j4h-Ja5W_fsg/edit?usp=sharing
>
> My proposal uses a logical split representation:
> - epochMicros: Long
> - nanosOfMicro: Short in [0, 999]
>
> This applies to both NTZ and LTZ nano-capable types; timezone
> semantics remain unchanged and are handled at interpretation
> boundaries (as today).
>
> Why this approach? I believe this is the most practical path for Spark
> because it:
> 0. Conforms to the SQL standard.
> 1. Preserves Spark’s existing microsecond approach. Most
> Catalyst/runtime datetime logic already uses micros. The split model
> extends it rather than replacing it.
> 2. Avoids INT64 epoch-nanos range cliff as the primary engine model. A
> single Long epoch-nanos representation constrains calendar range much
> more aggressively than Long micros.
> 3. Keeps migration risk lower. Existing microsecond behavior remains
> default; nano precision is opt-in via parameterized types/syntax.
> 4. Allows efficient implementation paths. Internals can still choose
> compact physical encodings (row/vector/file boundaries), while keeping
> one canonical logical contract.
>
> Related SPIPs considered. I reviewed and compared against these two drafts:
> - SPIP: Support NanoSecond Timestamps:
>
> https://docs.google.com/document/d/1wjFsBdlV2YK75x7UOk2HhDOqWVA0yC7iEiqOMnNnxlA/edit?tab=t.0#heading=h.4kibaxwtx2xo
> - SPIP: Support NanoSecond Timestamp Types:
>
> https://docs.google.com/document/d/1Q5u1whAO_KcT6d4dFFaIMy_S3RoQEo4Znwz2U-nbhls/edit?tab=t.0#heading=h.xk16mmomv6il
>
> Those drafts are valuable and informed this design. The key difference
> is that I prioritize micros-first engine continuity with a bounded
> nano remainder, instead of making epoch-nanos the primary internal
> semantic unit.
> In short: I think epochMicros + nanosOfMicro is a better fit for
> Spark’s current architecture and compatibility constraints, while
> still delivering practical nanosecond support.
>
> Thanks in advance for your feedback.
>
> Best regards,
> Max Gekk
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>

Reply via email to