GitHub user chenlica closed a discussion: Queries and Datasets (from old wiki)

>From the page https://github.com/apache/texera/wiki/Queries-and-Datasets (may 
>be dangling)

====

# Datasets
## 1. A snippet of the Twitter dataset. 

Each tweet is stored in Json format. To friendly visualize Json format, we 
suggest some online Json viewer, such as 
[JsonViewer](http://jsonviewer.stack.hu/).

> <pre><code>{"create_at": "2017-03-26T16:39:13.000Z", "id": 
> 846144537676431360, "text": "@carrieunderwood @opry hi carrie", 
> "in_reply_to_status": 845836315648315392, "in_reply_to_user": 386244525, 
> "favorite_count": 0, "retweet_count": 0, "lang": "en", "is_retweet": false, 
> "user_mentions": [386244525, 19772559], "user": {"id": 4217866818, "name": 
> "Lisa", "screen_name": "296_3676", "profile_image_url": 
> "http://pbs.twimg.com/profile_images/664966945435992065/Tw4npe2S_normal.jpg";, 
> "lang": "en", "location": "null", "create_at": "2015-11-12", "description": 
> "null", "followers_count": 31, "friends_count": 67, "statues_count": 182}, 
> "place": {"country": "United States", "country_code": "United States", 
> "full_name": "Beaver Dam, WI", "id": "1389f2635209d576", "name": "Beaver 
> Dam", "place_type": "city", "bounding_box": [[-88.870587, 43.431528], 
> [-88.786438, 43.508406]]}, "geo_tag": {"stateID": 55, "stateName": 
> "Wisconsin", "countyID": 55027, "countyName": "Dodge", "cityID": 5505900, 
> "cityName
 ": "Beaver Dam"}}</code></pre>
> <pre><code>{"create_at": "2017-08-09T11:22:09.000Z", "id": 
> 895349496749596672, "text": "Join the Noodles &amp; Co. team! See our latest 
> #job opening here: https://t.co/NljQxBaLc0 #Veterans #MilSpouse #Greenville, 
> NC #Hiring", "in_reply_to_status": -1, "in_reply_to_user": -1, 
> "favorite_count": 0, "coordinate": [-77.3818152, 35.5794052], 
> "retweet_count": 0, "lang": "en", "is_retweet": false, "hashtags": ["job", 
> "Veterans", "MilSpouse", "Greenville", "Hiring"], "user": {"id": 88254516, 
> "name": "TMJ-NC HRTA Jobs", "screen_name": "tmj_nc_hrta", 
> "profile_image_url": 
> "http://pbs.twimg.com/profile_images/667871532920639488/VroXHje4_normal.jpg";, 
> "lang": "en", "location": "North Carolina", "create_at": "2009-11-07", 
> "description": "Follow this account for geo-targeted 
> Hospitality/Restaurant/Tourism job tweets in North Carolina. Need help? Tweet 
> us at @CareerArc!", "followers_count": 399, "friends_count": 275, 
> "statues_count": 474}, "place": {"country": "United States", "country_code": 
> "Unite
 d States", "full_name": "North Carolina, USA", "id": "3b98b02fba3f9753", 
"name": "North Carolina", "place_type": "admin", "bounding_box": [[-84.321948, 
33.752879], [-75.40012, 36.588118]]}, "geo_tag": {"stateID": 37, "stateName": 
"North Carolina", "countyID": 37147, "countyName": "Pitt", "cityID": 3728080, 
"cityName": "Greenville"}}</code></pre>
> <pre><code>{"create_at": "2017-06-12T10:05:40.000Z", "id": 
> 874311754897137666, "text": "I told Tommy he had an obsession to something 
> and he goes \"you&apos;re my obsession\" and wow it was so cute I love him so 
> much\ufffd\ufffd", "in_reply_to_status": -1, "in_reply_to_user": -1, 
> "favorite_count": 0, "retweet_count": 0, "lang": "en", "is_retweet": false, 
> "user": {"id": 398387501, "name": "Amber Hargis", "screen_name": 
> "AmberHargis", "profile_image_url": 
> "http://pbs.twimg.com/profile_images/873390596974669824/2P3J0Hiw_normal.jpg";, 
> "lang": "en", "location": "Columbus, OH", "create_at": "2011-10-25", 
> "description": "@tommymalone2\u2764\ufe0f", "followers_count": 825, 
> "friends_count": 912, "statues_count": 9757}, "place": {"country": "United 
> States", "country_code": "United States", "full_name": "Gahanna, OH", "id": 
> "c97807ac2cd60207", "name": "Gahanna", "place_type": "city", "bounding_box": 
> [[-82.905845, 39.987076], [-82.802554, 40.05651]]}, "geo_tag": {"stateID": 
> 39, "stateName": "Oh
 io", "countyID": 39049, "countyName": "Franklin", "cityID": 3929106, 
"cityName": "Gahanna"}}</code></pre>

## 2. A snippet of the COCO dataset. 
> <pre><code>{"id": 10000, "text": 
> "train2014/COCO_train2014_000000105363.jpg"}</code></pre>
> <pre><code>{"id": 10001, "text": 
> "val2014/COCO_val2014_000000402233.jpg"}</code></pre>
> <pre><code>{"id": 10002, "text": 
> "val2014/COCO_val2014_000000559252.jpg"}</code></pre>
![COCO_train2014_000000105363](https://user-images.githubusercontent.com/41463232/146872216-ae15080c-61f5-4cc2-a4df-dff7bd9c5e1b.jpg)|![COCO_val2014_000000402233](https://user-images.githubusercontent.com/41463232/146872226-2335574c-d872-4759-bbfc-5a3729e478b5.jpg)|![COCO_val2014_000000559252](https://user-images.githubusercontent.com/41463232/146872228-cce1102f-669f-4d82-a105-fc0885c8fab4.jpg)
:-------------------------:|:-------------------------:|:-------------------------:

<!--![10000](https://drive.google.com/drive/folders/1dCEg7IobWWfEyERYFOaX3XnApu4MMpBj)|![10001](https://drive.google.com/drive/folders/1dCEg7IobWWfEyERYFOaX3XnApu4MMpBj)|![10002](https://drive.google.com/drive/folders/1dCEg7IobWWfEyERYFOaX3XnApu4MMpBj)
:-------------------------:|:-------------------------:|:-------------------------:-->

## 3. A snippet of the UCF101 dataset. 
> <pre><code>{"id": 5000, "text": "Haircut/v_Haircut_g20_c02.avi"}</code></pre>
> <pre><code>{"id": 5001, "text": 
> "ApplyLipstick/v_ApplyLipstick_g19_c02.avi"}</code></pre>
> <pre><code>{"id": 5002, "text": 
> "HandstandWalking/v_HandstandWalking_g02_c01.avi"}</code></pre>

# Queries
## 1. Ten queries on the Twitter dataset. 
### To study the behavior of CORE with different numbers of predicates, we 
randomly select five queries with two strong correlated predicates and five 
queries with three strong correlated predicates.
Id|Queries
:-------------------------:|:-------------------------
q<sub>1</sub>| SentimentStanfordNLP ('negative', 'neutral')&rarr; 
POSTaggerSpacyLG ('VBD', 'WRB', 'IN')
q<sub>2</sub>| SentimentStanfordNLP ('negative', 'neutral')&rarr; 
POSTaggerSpacyLG ('PRP')
q<sub>3</sub>| SentimentStanfordNLP ('neutral', 'positive')&rarr; 
POSTaggerSpacyLG ('NNPS', 'VB', 'VBZ', 'WRB')
q<sub>4</sub>| SentimentStanfordNLP ('neutral', 'positive')&rarr; 
POSTaggerSpacySM ('VBD', 'WRB', 'PRP')
q<sub>5</sub>| SentimentStanfordNLP ('neutral', 'positive')&rarr; 
POSTaggerSpacySM ('PRP')
q<sub>6</sub>| SentimentStanfordNLP ('neutral', 'positive')&rarr; 
POSTaggerStanfordNLP ('NNPS', 'VBP', 'WRB', '.')&rarr; POSTaggerSpacyLG 
('NNPS', 'VBD', 'VBN', 'WRB', 'DT')
q<sub>7</sub>| SentimentStanfordNLP ('positive')&rarr; POSTaggerStanfordNLP 
('NNPS', 'VB', 'VBD', 'VBN')&rarr; POSTaggerSpacyLG ('NNPS', 'VB', 'VBZ', 'WRB')
q<sub>8</sub>| SentimentStanfordNLP ('neutral', 'positive')&rarr; 
POSTaggerStanfordNLP ('NNPS', 'VB', 'VBD', 'VBN')&rarr; POSTaggerSpacyLG 
('NNPS', 'VBD', 'VBN', 'WRB', 'DT')
q<sub>9</sub>| SentimentStanfordNLP ('neutral', 'positive')&rarr; 
POSTaggerStanfordNLP ('NNPS', 'VB', 'VBD', 'VBN')&rarr; POSTaggerSpacyLG 
('NNPS', 'VB', 'VBZ', 'WRB')
q<sub>10</sub>| SentimentStanfordNLP ('neutral')&rarr; POSTaggerStanfordNLP 
('VBP', 'VBZ', 'WRB')&rarr; POSTaggerSpacyLG ('NNPS', 'VBD', 'VBN', 'WRB', 'DT')

## 2. Ten queries on the COCO dataset. 
### To study the behavior of CORE with different orders of predicates, we 
randomly select four pairs of queries. Each pair of queries contains two 
queries with different orders, such as q<sub>2</sub> and q<sub>3</sub>.
Id|Queries
:-------------------------:|:-------------------------
q<sub>1</sub>| ObjectDetection ('car', 'chair', 'dining table', 'bench', 'bed', 
'bird', 'vase')&rarr; ObjectDetection ('person')
q<sub>2</sub>| ObjectDetection ('person')&rarr; ObjectDetection ('car', 
'chair', 'cup', 'dog', 'handbag', 'sink', 'pizza')
q<sub>3</sub>| ObjectDetection ('car', 'chair', 'cup', 'dog', 'handbag', 
'sink', 'pizza')&rarr; ObjectDetection ('person')
q<sub>4</sub>| ObjectDetection ('person')&rarr; ObjectDetection ('car', 
'chair', 'cup', 'bottle', 'bed', 'cell phone', 'motorcycle')
q<sub>5</sub>| ObjectDetection ('car', 'chair', 'cup', 'bottle', 'bed', 'cell 
phone', 'motorcycle')&rarr; ObjectDetection ('person')
q<sub>6</sub>| ObjectDetection ('person')&rarr; ObjectDetection ('car', 
'chair', 'cup', 'tv', 'bed', 'bench', 'sink')
q<sub>7</sub>| ObjectDetection ('car', 'chair', 'cup', 'tv', 'bed', 'bench', 
'sink')&rarr; ObjectDetection ('person')
q<sub>8</sub>| ObjectDetection ('person')&rarr; ObjectDetection ('car', 
'chair', 'bottle', 'bowl', 'handbag', 'book', 'bird')
q<sub>9</sub>| ObjectDetection ('car', 'chair', 'bottle', 'bowl', 'handbag', 
'book', 'bird')&rarr; ObjectDetection ('person')
q<sub>10</sub>| ObjectDetection ('person')&rarr; ObjectDetection ('car', 
'chair', 'dining table', 'book', 'surfboard', 'bird', 'vase')

## 3. Ten queries on the UCF101 dataset. 
### For the UCF101dataset, we randomly select ten queries with strong 
correlations.
Id|Queries
:-------------------------:|:-------------------------
q<sub>1</sub>| ActivityRecognition ('Archery', 'BalanceBeam', 'Basketball', 
'BandMarching', 'BasketballDunk', 'Biking', 'BreastStroke', 'BenchPress', 
'BoxingPunchingBag', 'BlowDryHair', 'Bowling', 'BabyCrawling', 
'ApplyLipstick')&rarr; ObjectDetection ('chair', 'sports ball', 'dog', 'car', 
'tv', 'horse', 'bicycle', 'skateboard', 'tennis racket', 'surfboard', 'bird')
q<sub>2</sub>| ActivityRecognition ('Archery, 'BalanceBeam', 'Basketball', 
'BandMarching', 'Biking', 'BreastStroke', 'BrushingTeeth', 'BaseballPitch', 
'BoxingPunchingBag', 'BoxingSpeedBag', 'Bowling', 'ApplyLipstick')&rarr; 
ObjectDetection ('chair', 'sports ball', 'dog', 'car', 'tv', 'horse', 
'bicycle', 'skateboard', 'tennis racket', 'surfboard', 'bird')
q<sub>3</sub>| ActivityRecognition ('Archery', 'BalanceBeam', 'Basketball', 
'BandMarching', 'Biking', 'BodyWeightSquats', 'BreastStroke', 'BrushingTeeth', 
'BaseballPitch', 'Bowling', 'BabyCrawling')&rarr; ObjectDetection ('chair', 
'sports ball', 'dog', 'car', 'tv', 'horse', 'bicycle', 'skateboard', 'tennis 
racket', 'surfboard', 'bird')
q<sub>4</sub>| ActivityRecognition ('Archery', 'BalanceBeam', 'Basketball', 
'BasketballDunk', 'BlowingCandles', 'Biking', 'BreastStroke', 'BrushingTeeth', 
'BlowDryHair', 'BoxingSpeedBag')&rarr; ObjectDetection ('chair', 'sports ball', 
'dog', 'car', 'tv', 'horse', 'bicycle', 'skateboard', 'tennis racket', 
'surfboard', 'bird')
q<sub>5</sub>| ActivityRecognition ('Archery', 'BalanceBeam', 'Basketball', 
'BlowingCandles', 'BreastStroke', 'BrushingTeeth', 'BaseballPitch', 
'BenchPress', 'BlowDryHair', 'BoxingSpeedBag', 'Bowling')&rarr; ObjectDetection 
('chair', 'sports ball', 'dog', 'car', 'tv', 'horse', 'bicycle', 'skateboard', 
'tennis racket', 'surfboard', 'bird')
q<sub>6</sub>| ActivityRecognition ('Archery', 'Basketball', 'BandMarching', 
'BasketballDunk', 'Biking', 'BodyWeightSquats', 'BreastStroke', 
'BoxingPunchingBag', 'BoxingSpeedBag', 'Bowling', 'ApplyLipstick')&rarr; 
ObjectDetection ('chair', 'sports ball', 'dog', 'car', 'tv', 'horse', 
'bicycle', 'skateboard', 'tennis racket', 'surfboard', 'bird')
q<sub>7</sub>| ActivityRecognition ('Archery', 'BalanceBeam', 'BasketballDunk', 
'BlowingCandles', 'BodyWeightSquats', 'BreastStroke', 'BaseballPitch', 
'BoxingPunchingBag', 'BoxingSpeedBag', 'Bowling', 'BabyCrawling')&rarr; 
ObjectDetection ('chair', 'sports ball', 'dog', 'car', 'tv', 'horse', 
'bicycle', 'skateboard', 'tennis racket', 'surfboard', 'bird')
q<sub>8</sub>| ActivityRecognition ('Archery', 'Basketball', 'BandMarching', 
'BasketballDunk', 'BlowingCandles', 'Biking', 'BrushingTeeth', 'BaseballPitch', 
'BenchPress', 'Bowling')&rarr; ObjectDetection ('chair', 'sports ball', 'dog', 
'car', 'tv', 'horse', 'bicycle', 'skateboard', 'tennis racket', 'surfboard', 
'bird')
q<sub>9</sub>| ObjectDetection('chair', 'sports ball', 'dog', 'car', 'tv', 
'horse', 'bicycle', 'skateboard', 'tennis racket', 'boat', 'cup')&rarr; 
ActivityRecognition ('Archery', 'BalanceBeam', 'Basketball', 'BandMarching', 
'BasketballDunk', 'BlowingCandles', 'BodyWeightSquats', 'BreastStroke', 
'BrushingTeeth', 'BaseballPitch', 'BoxingPunchingBag', 'BoxingSpeedBag', 
'BabyCrawling', 'ApplyLipstick')
q<sub>10</sub>| ActivityRecognition('Archery', 'BalanceBeam', 'Basketball', 
'BandMarching', 'BasketballDunk', 'BlowingCandles', 'BodyWeightSquats', 
'BreastStroke', 'BrushingTeeth', 'BaseballPitch', 'BoxingPunchingBag', 
'BoxingSpeedBag', 'BabyCrawling', 'ApplyLipstick')&rarr; ObjectDetection 
('chair', 'sports ball', 'dog', 'car', 'tv', 'horse', 'bicycle', 'skateboard', 
'tennis racket', 'boat', 'cup')


GitHub link: https://github.com/apache/texera/discussions/3980

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]

Reply via email to