Re: [PR] Part-1: Pinot Timeseries Engine SPI [pinot]

via GitHub Tue, 24 Sep 2024 06:07:07 -0700


gortiz commented on code in PR #13885:
URL: https://github.com/apache/pinot/pull/13885#discussion_r1773308788



##########
pinot-timeseries/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/series/TimeSeries.java:
##########
@@ -0,0 +1,113 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.tsdb.spi.series;
+
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import javax.annotation.Nullable;
+import org.apache.pinot.tsdb.spi.TimeBuckets;
+
+
+/**
+ * Logically, a time-series is a list of pairs of time and data values, where 
time is stored in increasing order.
+ * A time-series is identified using its ID, which can be retrieved using 
{@link #getId()}.
+ * A time series typically also has a set of pairs of keys and values which 
are called tags or labels.
+ * We allow a Series to store time either via {@link TimeBuckets} or via a 
long array as in {@link #getTimeValues()}.
+ * Using {@link TimeBuckets} is ideal when your queries are working on evenly 
spaced time ranges. The other option
+ * exists to support use-cases such as "Instant Vectors" in PromQL.
+ * <p>
+ *   <b>Warning:</b> The time and value arrays passed to the Series are not 
copied, and can be modified by anyone with
+ *   access to them. This is by design, to make it easier to re-use buffers 
during time-series operations.
+ * </p>
+ */
+public class TimeSeries {
+  private final String _id;
+  private final Long[] _timeValues;
+  private final TimeBuckets _timeBuckets;
+  private final Double[] _values;
+  private final List<String> _tagNames;
+  private final Object[] _tagValues;

Review Comment:
   I guess this is fine for now, but IICU we are going to have tons of these 
objects during runtime. Therefore thinking about memory layout is pretty 
important. We should plan to create different TimeSeries for different data 
types with specific memory layouts. My main concern is the fact that we are 
using boxed arries, which are close to twice as expensive in terms of memory as 
a primitive array. Substituting `_values` with a `double[]` and a `BitSet` that 
marks the nulls should be quite cheaper in terms of memory and faster in terms 
of calculation. Same with `_timeValues`.
   
   Notice that I'm assuming values should be in the order or thousands at most, 
so `BitSet` should be better than `RoaringBitmap`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Re: [PR] Part-1: Pinot Timeseries Engine SPI [pinot]

Reply via email to