gortiz commented on code in PR #13885: URL: https://github.com/apache/pinot/pull/13885#discussion_r1773308788
########## pinot-timeseries/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/series/TimeSeries.java: ########## @@ -0,0 +1,113 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.tsdb.spi.series; + +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import javax.annotation.Nullable; +import org.apache.pinot.tsdb.spi.TimeBuckets; + + +/** + * Logically, a time-series is a list of pairs of time and data values, where time is stored in increasing order. + * A time-series is identified using its ID, which can be retrieved using {@link #getId()}. + * A time series typically also has a set of pairs of keys and values which are called tags or labels. + * We allow a Series to store time either via {@link TimeBuckets} or via a long array as in {@link #getTimeValues()}. + * Using {@link TimeBuckets} is ideal when your queries are working on evenly spaced time ranges. The other option + * exists to support use-cases such as "Instant Vectors" in PromQL. + * <p> + * <b>Warning:</b> The time and value arrays passed to the Series are not copied, and can be modified by anyone with + * access to them. This is by design, to make it easier to re-use buffers during time-series operations. + * </p> + */ +public class TimeSeries { + private final String _id; + private final Long[] _timeValues; + private final TimeBuckets _timeBuckets; + private final Double[] _values; + private final List<String> _tagNames; + private final Object[] _tagValues; Review Comment: I guess this is fine for now, but IICU we are going to have tons of these objects during runtime. Therefore thinking about memory layout is pretty important. We should plan to create different TimeSeries for different data types with specific memory layouts. My main concern is the fact that we are using boxed arries, which are close to twice as expensive in terms of memory as a primitive array. Substituting `_values` with a `double[]` and a `BitSet` that marks the nulls should be quite cheaper in terms of memory and faster in terms of calculation. Same with `_timeValues`. Notice that I'm assuming values should be in the order or thousands at most, so `BitSet` should be better than `RoaringBitmap` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org