kingswanwho commented on a change in pull request #2277: URL: https://github.com/apache/drill/pull/2277#discussion_r675609015
########## File path: _docs/zh/tutorials/030-analyzing-the-yelp-academic-dataset.md ########## @@ -4,41 +4,33 @@ slug: "Analyzing the Yelp Academic Dataset" parent: "教程" lang: "zh" --- -Apache Drill is one of the fastest growing open source projects, with the community making rapid progress with monthly releases. The key difference is Drill’s agility and flexibility. -Along with meeting the table stakes for SQL-on-Hadoop, which is to achieve low -latency performance at scale, Drill allows users to analyze the data without -any ETL or up-front schema definitions. The data can be in any file format -such as text, JSON, or Parquet. Data can have simple types such as strings, -integers, dates, or more complex multi-structured data, such as nested maps and -arrays. Data can exist in any file system, local or distributed, such as HDFS or S3. Drill, has a “no schema” approach, which enables you to get -value from your data in just a few minutes. - -Let’s quickly walk through the steps required to install Drill and run it -against the Yelp data set. The publicly available data set used for this -example is downloadable from [Yelp](http://www.yelp.com/dataset_challenge) -(business reviews) and is in JSON format. + +Apache Drill 是发展最快的开源项目之一,社区快速发展并每月保持新版本发布。Drill 的与众不同之处在于敏捷性和灵活性。 +为了满足 SQL 查询 Hadoop,并规模化减少延迟,Drill允许用户不必进行 ETL 流程或者 预先定义 schema。文件可以是任意格式,比如:纯文本,JSON 或者 Parquet。 +数据可以是简单的字符串,整数,日期,也可以是更复杂的多结构数据,比如嵌套地图和数组。数据可以保存在任意文件系统,本地或者分布式,比如 HDFS 或者 S3。Drill 具备 “no schema” 方法, Review comment: resolved ########## File path: _docs/zh/tutorials/030-analyzing-the-yelp-academic-dataset.md ########## @@ -154,17 +146,14 @@ You can directly query self-describing files such as JSON, Parquet, and text. Th | Spartan Animal Hospital | 07:30 | 18:00 | |----------------------------|------------|------------| -Note how Drill can traverse and refer through multiple levels of nesting. +请注意 Drill 如何遍历和引用多层级的嵌套数据。 + -### 3\. Get the amenities of each business in the data set +### 3\. 从数据集中得到每个商家的便利设施情况 -Note that the attributes column in the Yelp business data set has a different -element for every row, representing that businesses can have separate -amenities. Drill makes it easy to quickly access data sets with changing -schemas. +请注意 Yelp 商家数据集中,属性列的每一行都有不同的元素,代表商家有不同的便利设施。Drill 通过改变 schema 更简单的快速访问数据集。 -First, change Drill to work in all text mode (so we can take a look at all of -the data). +首先,更改配置使 Drill 可以识别所有的文本格式(我们便可查看所有的数据)。 Review comment: resolved ########## File path: _docs/zh/tutorials/030-analyzing-the-yelp-academic-dataset.md ########## @@ -310,13 +296,11 @@ of the reviews themselves. | Wicked Spoon | |-------------------------------| -#### Create a view with the combined business and reviews data sets +#### 创建了连接商户和评论数据集后的视图 -Note that Drill views are lightweight, and can just be created in the local -file system. Drill in standalone mode comes with a dfs.tmp workspace, which we -can use to create views (or you can can define your own workspaces on a local -or distributed file system). If you want to persist the data physically -instead of in a logical view, you can use CREATE TABLE AS syntax. +Drill 的视角是轻量级的,且只是创建在本地文件系统。 Review comment: resolved -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
