Re: New here - Python client to pandas

Igor Gois Tue, 02 Feb 2021 09:23:13 -0800

Hi Steve,

Executing only the query took 1,67s:


%%time
session = Session(ip, port_, username_, password_,fetch_size,zone_id)
session.open(False)
session_data_set.set_fetch_size(1024)
session_data_set = session.execute_query_statement("select
s9,s10,s11,s12,s13,s14,s15,s16,s17,s18,s19,s22,s23,s24,s25,s26,s27,s28,s29,s30,s31,s32,s33,s61,s62,s64,s72,s56,s57
from root.sae.trends.1 order by time desc")

output:

setting time zone_id as -0300, message: None
Wall time: 1.67 s


The same query + dataset to dataframe process took 48s:

def sql_to_pandas(sql):

    session_data_set = session.execute_query_statement(sql)
    results_columns = session_data_set.get_column_names()
    x = []

    while session_data_set.has_next():
        x.append(tuple(session_data_set.next().__str__().split()))

    results = pd.DataFrame(x,columns=results_columns)
    results['Time'] = results['Time'].astype('int64').apply(lambda x:
x/1000)
    results['Time'] = pd.to_datetime(results['Time'], unit="s")

    return results

# execute sql query statement
x = sql_to_pandas("select
s9,s10,s11,s12,s13,s14,s15,s16,s17,s18,s19,s22,s23,s24,s25,s26,s27,s28,s29,s30,s31,s32,s33,s61,s62,s64,s72,s56,s57
from root.sae.trends.1 order by time desc")

output:

[40778 rows x 30 columns]
Wall time: 48.4 s


If someone has any idea on how to improve this, please let me know

Thank you so much everyone

Att,

Igor Gois






Am Di., 2. Feb. 2021 um 12:57 Uhr schrieb Steve Su <[email protected]>:

> Hi Igor,
>
> > Is there a faster way to do this?
>
> From my point of view, your current implementation should be fast enough.
>
> > When I remove the 'limit 100' it takes too long. The time series has 72
> sensors with 40k datapoints each.
>
> For IoTDB, this amount of data is not large. I want to know how much time
> was spent in executing the query (excluding the time to convert the dataset
> into a DataFrame).
>
> In addition, you can also have a check on the proportion of unsequence
> data. When there is a lot of unsequence data, the query may be slow.
> Sequence data and unsequence data are under the folders data/data/sequence
> and data/data/unsequence respectively.
>
> Best,
> Steve Su
>
> ------------------ 原始邮件 ------------------
> 发件人: "dev" <[email protected]>;
> 发送时间: 2021年2月2日(星期二) 晚上8:09
> 收件人: "dev"<[email protected]>;
> 主题: Re: New here - Python client to pandas
>
> Thank you, Xiangdong
>
> Here is the python code that I used:
>
> # execute sql query statement
> session_data_set = session.execute_query_statement("select
>
> s9,s10,s11,s12,s13,s14,s15,s16,s17,s18,s19,s22,s23,s24,s25,s26,s27,s28,s29,s30,s31,s32,s33,s61,s62,s64,s72,s56,s57
> from root.sae.trends.1 order by time desc limit 100")
>
> session_data_set.set_fetch_size(1024)
> results_columns = session_data_set.get_column_names()
> print(session_data_set.get_column_types())
> print(results_columns)
>
> results = pd.DataFrame(columns=results_columns)
>
> while session_data_set.has_next():
>     row = session_data_set.next()
>     row_str = row.__str__().split()
>     row_str[0] =
> datetime.fromtimestamp(row.get_timestamp()/1000).strftime("%Y-%m-%d
> %H:%M:%S")
>
>     results =
> results.append(pd.Series(row_str,index=results_columns),ignore_index=True)
>
>
> print(results)
> session_data_set.close_operation_handle()
>
> When I remove the 'limit 100' it takes too long. The time series has 72
> sensors with 40k datapoints each.
>
> Is there a faster way to do this?
>
> Thank you in advance
>
>
> IoTDB> select count(s1) from root.sae.trends.1
> +---------------------------+
> |count(root.sae.trends.1.s1)|
> +---------------------------+
> |                      40778|
> +---------------------------+
>
> IoTDB> count timeseries root
> +-----+
> |count|
> +-----+
> |   72|
> +-----+
>
> Att,
>
> Igor Gois
>
>
>
>
>
>
> Am Di., 2. Feb. 2021 um 02:00 Uhr schrieb Xiangdong Huang <
> [email protected]>:
>
> > By the way, indeed we should consider whether our iotdb-session python
> API
> > is efficient enough (we have many array operations and serialization
> > operations in the Python lib).
> >
> > As I am not an expert of Python, I cannot get the conclusion. Also call
> > contributors.
> >
> > Best,
> > -----------------------------------
> > Xiangdong Huang
> > School of Software, Tsinghua University
> >
> >  黄向东
> > 清华大学 软件学院
> >
> >
> > Xiangdong Huang <[email protected]> 于2021年2月2日周二 下午12:56写道：
> >
> > > Hi Igor,
> > >
> > > > I am sorry if this is not the right place to ask
> > >
> > > The mailing list is absolutely the right place:D
> > > But the problem is our mailing list only supports plain texts.
> > > If you have to upload screenshots, please create a jira (
> > > https://issues.apache.org/jira/projects/IOTDB/issues).
> > >
> > > Actually several days ago, my mate @[email protected]
> > > <[email protected]>  also discussed with me about integration IoTDB
> > > python with Pandas.
> > > I think it is a great idea, and we can develop it together.
> > >
> > > Best,
> > > -----------------------------------
> > > Xiangdong Huang
> > > School of Software, Tsinghua University
> > >
> > >  黄向东
> > > 清华大学 软件学院
> > >
> > >
> > > Igor Gois <[email protected]> 于2021年2月2日周二 上午6:23写道：
> > >
> > >> Hi everyone,
> > >>
> > >> I am sorry if this is not the right place to ask, but I couldn't
> install
> > >> QQ or we chat. I am from Brazil.
> > >>
> > >> I am using a python client and I would like to know if there is a way
> to
> > >> transform a session dataset (session.execute_query_statement) into a
> > pandas
> > >> dataframe?
> > >>
> > >> I tried with the example code in the while loop. It worked, but it was
> > >> not so fast.
> > >>
> > >> I am in a project to store and analyze wind turbines data and iotdb
> > seems
> > >> to be a great fit.
> > >>
> > >> Thanks in advance and sorry if this is the wrong place
> > >>
> > >> [image: image.png]
> > >>
> > >>
> > >> Igor Gois
> > >>
> > >>
> > >>
> > >>
> > >>
> >

Re: New here - Python client to pandas

Reply via email to