Hi Steve,
Executing only the query took 1,67s:
%%time
session = Session(ip, port_, username_, password_,fetch_size,zone_id)
session.open(False)
session_data_set.set_fetch_size(1024)
session_data_set = session.execute_query_statement("select
s9,s10,s11,s12,s13,s14,s15,s16,s17,s18,s19,s22,s23,s24,s25,s26,s27,s28,s29,s30,s31,s32,s33,s61,s62,s64,s72,s56,s57
from root.sae.trends.1 order by time desc")
output:
setting time zone_id as -0300, message: None
Wall time: 1.67 s
The same query + dataset to dataframe process took 48s:
def sql_to_pandas(sql):
session_data_set = session.execute_query_statement(sql)
results_columns = session_data_set.get_column_names()
x = []
while session_data_set.has_next():
x.append(tuple(session_data_set.next().__str__().split()))
results = pd.DataFrame(x,columns=results_columns)
results['Time'] = results['Time'].astype('int64').apply(lambda x:
x/1000)
results['Time'] = pd.to_datetime(results['Time'], unit="s")
return results
# execute sql query statement
x = sql_to_pandas("select
s9,s10,s11,s12,s13,s14,s15,s16,s17,s18,s19,s22,s23,s24,s25,s26,s27,s28,s29,s30,s31,s32,s33,s61,s62,s64,s72,s56,s57
from root.sae.trends.1 order by time desc")
output:
[40778 rows x 30 columns]
Wall time: 48.4 s
If someone has any idea on how to improve this, please let me know
Thank you so much everyone
Att,
Igor Gois
Am Di., 2. Feb. 2021 um 12:57 Uhr schrieb Steve Su <[email protected]>:
> Hi Igor,
>
> > Is there a faster way to do this?
>
> From my point of view, your current implementation should be fast enough.
>
> > When I remove the 'limit 100' it takes too long. The time series has 72
> sensors with 40k datapoints each.
>
> For IoTDB, this amount of data is not large. I want to know how much time
> was spent in executing the query (excluding the time to convert the dataset
> into a DataFrame).
>
> In addition, you can also have a check on the proportion of unsequence
> data. When there is a lot of unsequence data, the query may be slow.
> Sequence data and unsequence data are under the folders data/data/sequence
> and data/data/unsequence respectively.
>
> Best,
> Steve Su
>
> ------------------ 原始邮件 ------------------
> 发件人: "dev" <[email protected]>;
> 发送时间: 2021年2月2日(星期二) 晚上8:09
> 收件人: "dev"<[email protected]>;
> 主题: Re: New here - Python client to pandas
>
> Thank you, Xiangdong
>
> Here is the python code that I used:
>
> # execute sql query statement
> session_data_set = session.execute_query_statement("select
>
> s9,s10,s11,s12,s13,s14,s15,s16,s17,s18,s19,s22,s23,s24,s25,s26,s27,s28,s29,s30,s31,s32,s33,s61,s62,s64,s72,s56,s57
> from root.sae.trends.1 order by time desc limit 100")
>
> session_data_set.set_fetch_size(1024)
> results_columns = session_data_set.get_column_names()
> print(session_data_set.get_column_types())
> print(results_columns)
>
> results = pd.DataFrame(columns=results_columns)
>
> while session_data_set.has_next():
> row = session_data_set.next()
> row_str = row.__str__().split()
> row_str[0] =
> datetime.fromtimestamp(row.get_timestamp()/1000).strftime("%Y-%m-%d
> %H:%M:%S")
>
> results =
> results.append(pd.Series(row_str,index=results_columns),ignore_index=True)
>
>
> print(results)
> session_data_set.close_operation_handle()
>
> When I remove the 'limit 100' it takes too long. The time series has 72
> sensors with 40k datapoints each.
>
> Is there a faster way to do this?
>
> Thank you in advance
>
>
> IoTDB> select count(s1) from root.sae.trends.1
> +---------------------------+
> |count(root.sae.trends.1.s1)|
> +---------------------------+
> | 40778|
> +---------------------------+
>
> IoTDB> count timeseries root
> +-----+
> |count|
> +-----+
> | 72|
> +-----+
>
> Att,
>
> Igor Gois
>
>
>
>
>
>
> Am Di., 2. Feb. 2021 um 02:00 Uhr schrieb Xiangdong Huang <
> [email protected]>:
>
> > By the way, indeed we should consider whether our iotdb-session python
> API
> > is efficient enough (we have many array operations and serialization
> > operations in the Python lib).
> >
> > As I am not an expert of Python, I cannot get the conclusion. Also call
> > contributors.
> >
> > Best,
> > -----------------------------------
> > Xiangdong Huang
> > School of Software, Tsinghua University
> >
> > 黄向东
> > 清华大学 软件学院
> >
> >
> > Xiangdong Huang <[email protected]> 于2021年2月2日周二 下午12:56写道:
> >
> > > Hi Igor,
> > >
> > > > I am sorry if this is not the right place to ask
> > >
> > > The mailing list is absolutely the right place:D
> > > But the problem is our mailing list only supports plain texts.
> > > If you have to upload screenshots, please create a jira (
> > > https://issues.apache.org/jira/projects/IOTDB/issues).
> > >
> > > Actually several days ago, my mate @[email protected]
> > > <[email protected]> also discussed with me about integration IoTDB
> > > python with Pandas.
> > > I think it is a great idea, and we can develop it together.
> > >
> > > Best,
> > > -----------------------------------
> > > Xiangdong Huang
> > > School of Software, Tsinghua University
> > >
> > > 黄向东
> > > 清华大学 软件学院
> > >
> > >
> > > Igor Gois <[email protected]> 于2021年2月2日周二 上午6:23写道:
> > >
> > >> Hi everyone,
> > >>
> > >> I am sorry if this is not the right place to ask, but I couldn't
> install
> > >> QQ or we chat. I am from Brazil.
> > >>
> > >> I am using a python client and I would like to know if there is a way
> to
> > >> transform a session dataset (session.execute_query_statement) into a
> > pandas
> > >> dataframe?
> > >>
> > >> I tried with the example code in the while loop. It worked, but it was
> > >> not so fast.
> > >>
> > >> I am in a project to store and analyze wind turbines data and iotdb
> > seems
> > >> to be a great fit.
> > >>
> > >> Thanks in advance and sorry if this is the wrong place
> > >>
> > >> [image: image.png]
> > >>
> > >>
> > >> Igor Gois
> > >>
> > >>
> > >>
> > >>
> > >>
> >