[incubator-sdap-in-situ-data-services] branch master updated (b68a24f -> 4e65e0e)

nchung Wed, 30 Mar 2022 11:05:08 -0700

This is an automated email from the ASF dual-hosted git repository.

nchung pushed a change to branch master
in repository 
https://gitbox.apache.org/repos/asf/incubator-sdap-in-situ-data-services.git.



    from b68a24f  SDAP-354: Import initial version of in situ data services
     new d237005  Initial commit
     new 23a0f94  feat: add WIP parquet flask server
     new 52a6058  chore: some updates
     new 3efd940  feat: wip: update spark to upload file to s3
     new a877b29  feat: upgrade to hadoop 3.2.0 for s3 connection
     new ae7c43d  feat: update spark config library for aws
     new 867b392  feat: update code for flask to work with cluster
     new 9003fbd  fix: update PythonUtils.getPythonAuthSocketTimeout error
     new 6adb579  feat: add file logger
     new 32ca75f  feat: avoid commiting aws credentials
     new 2afb10a  feat: add endpoint to download file from s3 to upload to 
parquet
     new 88bb3c6  fix: need aws variables in env + update file delete function
     new 152949f  feat: add query logic + refactor some code
     new fbc5c70  feat: add query endpoint to flask
     new 1df354b  feat: add tag method + set url first before executing methods
     new 38012aa  feat: add year & month for partitions
     new 7ac0487  feat: tag ingested s3 objects
     new 1e50ed7  chore: update formate for easier readability
     new 7ec98e5  feat: update schema + add schema for parquet
     new 07807b0  chore: add more files to ignore
     new 45014ec  breaking: update ingest logic with updated schema
     new 2697d99  breaking: update ingest endpoint with updated schema and logic
     new b769827  fix: disable validating individual observation record coz it 
is too slow
     new 1c92bca  chore: move the logger statement outside of the loop
     new 765e966  feat: add new query class which can handle more queries & 
pagination
     new 8ad6b3c  breaking: update endpoint to work with new query class
     new cef3f20  fix: use pyspark to remove columns
     new 21998d2  feat: allow query to choose only specific columns
     new 532b3f1  feat: use append mode again + partition by job-id + remove 
ingested date
     new d9864b2  fix: update schema and parse method to include columns
     new d632b45  feat: add method & endpoint to replace file
     new edf682e  chore: update readme
     new c32b2ad  feat: update aws creds w/o token
     new ec4ce11  fix: s3-download parameter is in wrong order + update to 
custom ports
     new 6cd04f5  fix: add log stmts + fix typo
     new 703440f  chore: update log stmt
     new a482b06  feat: check session_token is null b4 setting it
     new a412dad  chore: add s3 log stmts
     new 3fb0c7b  fix: wrong argument when calling s3 download method
     new 086818f  chore: more log stmt
     new 4af5f92  fix: need to re-use initialized s3 class. not new one
     new 19af269  fix: schema is updated. where to find data types also need to 
be updated
     new 983289a  feat: use fastjsonschema + parallel process to validate large 
json arrays
     new 3a8f351  fix: calling s3 class twice in replace_json_s3 endpoint
     new ee85c85  feat: add get endpoint + add doms compatible endpoint
     new 428b09f  feat: add platform_code,variable,quality_flag
     new ac12689  feat: adding platform code to the columns + make it a 
partition
     new 41f4066  feat: adding ddb logic (wip) + refactor how to receive aws 
cred
     new e11eddd  feat: finished creating ddb classes
     new 5e5129b  feat:add metadata to ddb tbl
     new ad1d2be  feat: unzip s3 file if it is zipped
     new 71dc71e  feat: extract ingest aws json file logic to its own class
     new c124480  fix: allow get method + start_from & size needs to be int
     new 48884f1  fix:add unique temporary folder + create it
     new 03e585f  feat: add pagination to doms response
     new 3a323c8  fix: resource & client are not methods
     new 53ffde1  fix: insert record in ddb needs to update logic
     new 8ca9eb6  chore: update ddb class name
     new aced8af  fix:if expecting millisecond, convert to float first
     new c37460d  feat: validate ingest/replace against DDB first + add stream 
logger + get log_level from env
     new 09c1ec3  fix: attempt to reduce query time
     new 88767eb  breaking: upgrade to python3.9 dependencies + junk code to 
try to speed up Parquet
     new e09f47f  fix: hardcoding total number for now
     new 91451cf  chore: small tweak to spark executor RAM to compare 
performance
     new d2ab730  chore:increase more resources
     new bda09cf  feat: map local directory to all services in docker-compose + 
change Parquet storage to local
     new 3f879c9  fix: adding missing platform_code
     new 3034197  fix: update spark parameters for k8s spark cluster
     new 123afcb  Initial SwaggerUI deployment + OpenAPI spec
     new b6e2084  Fix startTime/endTime examples
     new 5425432  Merge pull request #1 from access-cdms/CDMS-79
     new 1a565aa  Merge branch 'master' of 
github.jpl.nasa.gov:access-cdms/in-situ-data-services
     new 71f5fda  feat: add month to sql filter + s3 list children method
     new b9ca714  feat: add provider, project to doms api
     new 06987e5  feat: validate sha512 before ingestion
     new e7c75dd  fix: sha512 bug + disable tagging + add more info in response
     new 235b3ca  chore: add more details on response
     new e5bd7db  fix: need to extract sha512 b4 comparing
     new bdab0af  feat: update code for k8s spark + instruction to setup k8s 
spark
     new cd9a334  Added Apache 2.0 license.
     new d5fb0c6  Merge branch 'master' of 
github.jpl.nasa.gov:access-cdms/in-situ-data-services
     new 4478192  feat: remove brackets in doms get parameters for bbox
     new 5e3b2dc  fix: replace json array with comma separated str for normal 
query as well + update descriptions
     new 4970959  chore: merge from forked_apache master
     new 52b47e2  fix: relace big jar with text file
     new d60d304  chore: remove old hadoop libraries
     new 5cf4c86  chore: update ignore file to remove aws library + removed aws 
jar from git history
     new 7bf15ea  chore: merged from origin master
     new 321a05f  fix: more options when connecting to spark
     new 9ce9573  chore: use class variable to avoid typo
     new 6db2449  fix: add spark.driver.host to talk to k8s sparl
     new 60279ca  feat: add simple k8s for parquet
     new 1c43877  chore: update configmap value + move values.yaml to another 
location
     new f8a7b73  chore: use docker.io image tag
     new 00954d2  feat: add jupyter notebook for demo
     new 506ddda  fix: add sample response
     new 2ea30e7  chore: add more details
     new 04fa46d  feat: allow default boto3 session for iam base roles
     new 9a27f94  chore: update file for pep-8
     new 7167ab2  feat: prep for spark3.2.0
     new d5f93c9  fix: allow aws token from secret file
     new f379e8b  feat: add missing depth value condition
     new 1a092da  chore: add raw query for debugging purpose
     new 21b5e8f  fix: allow spark config come from env file
     new 29daaea  feat: add extra spark setting
     new 646a696  fix: validate NULL before type checking
     new 1934c81  fix: update spark aws cred logic
     new 9e89d68  fix: add missing depth condition with "OR" statement
     new dcf09c9  chore: update values.yaml with EKS values
     new ff244b6  chore: update readme
     new 6342e5e  chore: add helm scripts (in-progress)
     new 215589f  fix: spark_config_dict needs a dict + aws creds directly from 
values.yaml now
     new 22a3380  chore: saving progress
     new ba849e5  fix: add condition to set empty secret or real one
     new 812aa7d  chore: add readme + update git ignore
     new 800e3c4  fix: unable to query the service, only the pod before this fix
     new 6afee71  feat: add auth header to ingest new files
     new 11c532e  fix: typos + update docker with file base auth for now
     new 7756a99  feat: add comma separated variable and columns to the query 
parquet logic
     new 5d06697  feat: add column
     new 375379e  chore: update docker tag
     new 2330ff4  fix: update docker-compose dockerfile
     new 7362eb2  fix: ddb name comes from setting
     new 5f0f6f9  chore: add deployment guide
     new c9d8ca3  chore: move docker files to docker directory
     new 5f412f6  chore: move jupyter notebook to documentaiton directory
     new af71dba  feat: add aws lambda code to ingest data from S3 to parquet 
(not.tested)
     new 1a7c5d4  chore: move flask server starting script to the module
     new e1ef03f  feat: add terraform (in.progress)
     new ca2d1ef  feat: add bench_mark tests
     new 0d2c219  feat: add new type of bench_mark
     new 6131ebc  feat: add addition key to run ingest in background
     new bc9090b  fix: accepting s3_url from event for now
     new e08faa5  fix: update ingest lambda header
     new 67b5f1e  fix: compute sha512 from original s3 file + bug on retrieving 
optional parameter in ingest endpoints
     new 139754b  chore:adding documentation for performance issue
     new 66c7931  chore: update rebuilding of images by shuffling the stacks
     new bc0a17b  fix: repartition to reduce number of ingested files in 
parquet + overwrite vs. append for replace vs. insert
     new 1bf8f4c  chore: update time filter
     new 98f8844  feat: add partition to the parquet path to increase speed 
(#44)
     new 1530115  fix: return correct size for last page (need count to do it)
     new f27cf93  chore: add test result
     new 6580a34  chore: remove comment + update test
     new 672e41a  feat: update swagger with latest changes
     new 9788e7f  fix: allow URL ending with `/` also works
     new 29733d8  fix: disable redirect if `/` is not in the URL
     new 496a5f3  fix: rename apidocs to a unique name to avoid weird bug 
pointing to sdap swagger
     new c78cb4b  feat: remove old domains + add provider & project
     new f02ddff  feat: add platform code
     new d459341  feat: multiple dataframe read & union all + multiple 
selective month (#47)
     new 190af10  chore: update benchmark results
     new 9e36b5d  feat: accept multiple platform values separated by comma (#54)
     new 76bbc48  fix: update bug in dataframe union
     new 4e65e0e  Merge pull request #1 from wphyojpl/master

The 155 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .gitignore                                         |    8 +-
 Deployment-in-AWS.md                               |  117 +++
 docker/parquet.spark.3.1.2.r70.Dockerfile          |   19 +
 docker/parquet.spark.3.2.0.r44.Dockerfile          |   21 +
 .../jupyter.notebooks/cdms.demo.2021.12.09.ipynb   | 1089 ++++++++++++++++++++
 .../in-situ-architecture-demo.png                  |  Bin 0 -> 186934 bytes
 .../jupyter.notebooks/in-situ-architecture.png     |  Bin 0 -> 170538 bytes
 .../in-situ-parquet-partition.png                  |  Bin 0 -> 45606 bytes
 .../jupyter.notebooks/in-situ-s3-input-data.png    |  Bin 0 -> 439347 bytes
 k8s_spark/README.md                                |   78 ++
 k8s_spark/configmap.yml                            |   10 +
 k8s_spark/flask-deployment.yml                     |   72 ++
 k8s_spark/flask-service.yml                        |   15 +
 k8s_spark/k8s_spark/values.yaml                    |  694 +++++++++++++
 k8s_spark/parquet.spark.helm/.helmignore           |   23 +
 k8s_spark/parquet.spark.helm/Chart.yaml            |   24 +
 k8s_spark/parquet.spark.helm/README.md             |   14 +
 k8s_spark/parquet.spark.helm/templates/NOTES.txt   |   22 +
 .../parquet.spark.helm/templates/_helpers.tpl      |   62 ++
 .../parquet.spark.helm/templates/deployment.yaml   |   97 ++
 k8s_spark/parquet.spark.helm/templates/hpa.yaml    |   28 +
 .../parquet.spark.helm/templates/ingress.yaml      |   41 +
 k8s_spark/parquet.spark.helm/templates/secret.yaml |    9 +
 .../parquet.spark.helm/templates/service.yaml      |   16 +
 .../templates/serviceaccount.yaml                  |   12 +
 .../templates/tests/test-connection.yaml           |   15 +
 k8s_spark/parquet.spark.helm/values.yaml           |   92 ++
 local.spark.cluster/README.md                      |   17 +
 local.spark.cluster/aws-java-sdk-1.7.4.jar         |  Bin 0 -> 11948376 bytes
 .../aws-java-sdk-bundle-1.11.563.jar.txt           |    1 +
 local.spark.cluster/build.sh                       |   33 +
 local.spark.cluster/cluster-base.Dockerfile        |   19 +
 local.spark.cluster/docker-compose.yml             |   82 ++
 local.spark.cluster/hadoop-aws-3.2.0.jar           |  Bin 0 -> 480674 bytes
 local.spark.cluster/jupyterlab.Dockerfile          |   16 +
 local.spark.cluster/parquet-flask.Dockerfile       |   17 +
 local.spark.cluster/spark-base.Dockerfile          |   36 +
 local.spark.cluster/spark-defaults.conf            |   52 +
 local.spark.cluster/spark-master.Dockerfile        |    8 +
 local.spark.cluster/spark-worker.Dockerfile        |    8 +
 flask_server.py => parquet_flask/__main__.py       |    0
 parquet_flask/authenticator/__init__.py            |    0
 .../authenticator/authenticator_abstract.py        |   12 +
 .../authenticator_aws_secret_manager.py            |   37 +
 .../authenticator/authenticator_factory.py         |   18 +
 .../authenticator/authenticator_filebased.py       |   33 +
 .../authenticator/authenticator_pass_through.py    |   11 +
 parquet_flask/aws/aws_cred.py                      |   38 +-
 parquet_flask/aws/aws_ddb.py                       |    4 +-
 parquet_flask/aws/aws_s3.py                        |   31 +-
 parquet_flask/aws/aws_secret_manager.py            |   46 +
 parquet_flask/cdms_lambda_func/__init__.py         |    0
 .../cdms_lambda_func/ingest_s3_to_cdms/__init__.py |    0
 .../ingest_s3_to_cdms/execute_lambda.py            |    6 +
 .../ingest_s3_to_cdms/ingest_s3_to_cdms.py         |   50 +
 parquet_flask/cdms_lambda_func/lambda_func_env.py  |    5 +
 parquet_flask/io_logic/cdms_constants.py           |    4 +
 parquet_flask/io_logic/cdms_schema.py              |   42 +
 parquet_flask/io_logic/ingest_new_file.py          |   39 +-
 parquet_flask/io_logic/metadata_tbl_io.py          |    3 +-
 .../parquet_query_condition_management_v3.py       |  232 +++++
 parquet_flask/io_logic/partitioned_parquet_path.py |  130 +++
 parquet_flask/io_logic/query.py                    |  157 ---
 parquet_flask/io_logic/query_v2.py                 |  146 +--
 parquet_flask/io_logic/query_v4.py                 |  136 +++
 parquet_flask/io_logic/raw_query.py                |  127 +++
 parquet_flask/io_logic/retrieve_spark_session.py   |   85 +-
 parquet_flask/io_logic/spark_constants.py          |    8 +
 parquet_flask/utils/config.py                      |   35 +-
 parquet_flask/utils/general_utils.py               |   35 +
 parquet_flask/v1/__init__.py                       |    3 +-
 parquet_flask/v1/authenticator_decorator.py        |   23 +
 parquet_flask/v1/ingest_aws_json.py                |  156 ++-
 parquet_flask/v1/ingest_json_s3.py                 |    8 +
 .../v1/{apidocs.py => insitu_query_swagger.py}     |   14 +-
 .../{apidocs => insitu_query_swagger}/index.html   |    0
 .../insitu-spec-0.0.1.yml                          |  190 ++--
 parquet_flask/v1/query_data.py                     |   44 +-
 parquet_flask/v1/query_data_doms.py                |   26 +-
 parquet_flask/v1/replace_json_s3.py                |    8 +
 s3a.parquet.performance.issue.md                   |  121 +++
 setup.py                                           |    1 +
 terraform/cdms-parquet-tf/ddb.tf                   |   38 +
 terraform/cdms-parquet-tf/eks.tf                   |    0
 terraform/cdms-parquet-tf/lambda.tf                |    0
 terraform/cdms-parquet-tf/main.tf                  |   14 +
 terraform/cdms-parquet-tf/s3.tf                    |   36 +
 terraform/cdms-parquet-tf/variables.tf             |   19 +
 terraform/cmd-paruqet.tf                           |   22 +
 terraform/main.tf                                  |    4 +
 terraform/variables.tf                             |   19 +
 tests/__init__.py                                  |    0
 tests/bench_mark/__init__.py                       |    0
 tests/bench_mark/bench_mark.py                     |  527 ++++++++++
 tests/bench_mark/func_exec_time_decorator.py       |   17 +
 tests/parquet_flask/__init__.py                    |    0
 tests/parquet_flask/io_logic/__init__.py           |    0
 .../test_parquet_query_condition_management_v3.py  |  373 +++++++
 .../io_logic/test_partitioned_parquet_path.py      |   14 +
 tests/parquet_flask/utils/__init__.py              |    0
 tests/parquet_flask/utils/test_general_utils.py    |   26 +
 101 files changed, 5541 insertions(+), 499 deletions(-)
 create mode 100644 Deployment-in-AWS.md
 create mode 100644 docker/parquet.spark.3.1.2.r70.Dockerfile
 create mode 100644 docker/parquet.spark.3.2.0.r44.Dockerfile
 create mode 100644 documentations/jupyter.notebooks/cdms.demo.2021.12.09.ipynb
 create mode 100644 
documentations/jupyter.notebooks/in-situ-architecture-demo.png
 create mode 100644 documentations/jupyter.notebooks/in-situ-architecture.png
 create mode 100644 
documentations/jupyter.notebooks/in-situ-parquet-partition.png
 create mode 100644 documentations/jupyter.notebooks/in-situ-s3-input-data.png
 create mode 100644 k8s_spark/README.md
 create mode 100644 k8s_spark/configmap.yml
 create mode 100644 k8s_spark/flask-deployment.yml
 create mode 100644 k8s_spark/flask-service.yml
 create mode 100644 k8s_spark/k8s_spark/values.yaml
 create mode 100644 k8s_spark/parquet.spark.helm/.helmignore
 create mode 100644 k8s_spark/parquet.spark.helm/Chart.yaml
 create mode 100644 k8s_spark/parquet.spark.helm/README.md
 create mode 100644 k8s_spark/parquet.spark.helm/templates/NOTES.txt
 create mode 100644 k8s_spark/parquet.spark.helm/templates/_helpers.tpl
 create mode 100644 k8s_spark/parquet.spark.helm/templates/deployment.yaml
 create mode 100644 k8s_spark/parquet.spark.helm/templates/hpa.yaml
 create mode 100644 k8s_spark/parquet.spark.helm/templates/ingress.yaml
 create mode 100644 k8s_spark/parquet.spark.helm/templates/secret.yaml
 create mode 100644 k8s_spark/parquet.spark.helm/templates/service.yaml
 create mode 100644 k8s_spark/parquet.spark.helm/templates/serviceaccount.yaml
 create mode 100644 
k8s_spark/parquet.spark.helm/templates/tests/test-connection.yaml
 create mode 100644 k8s_spark/parquet.spark.helm/values.yaml
 create mode 100644 local.spark.cluster/README.md
 create mode 100644 local.spark.cluster/aws-java-sdk-1.7.4.jar
 create mode 100644 local.spark.cluster/aws-java-sdk-bundle-1.11.563.jar.txt
 create mode 100755 local.spark.cluster/build.sh
 create mode 100644 local.spark.cluster/cluster-base.Dockerfile
 create mode 100644 local.spark.cluster/docker-compose.yml
 create mode 100644 local.spark.cluster/hadoop-aws-3.2.0.jar
 create mode 100644 local.spark.cluster/jupyterlab.Dockerfile
 create mode 100644 local.spark.cluster/parquet-flask.Dockerfile
 create mode 100644 local.spark.cluster/spark-base.Dockerfile
 create mode 100644 local.spark.cluster/spark-defaults.conf
 create mode 100644 local.spark.cluster/spark-master.Dockerfile
 create mode 100644 local.spark.cluster/spark-worker.Dockerfile
 rename flask_server.py => parquet_flask/__main__.py (100%)
 create mode 100644 parquet_flask/authenticator/__init__.py
 create mode 100644 parquet_flask/authenticator/authenticator_abstract.py
 create mode 100644 
parquet_flask/authenticator/authenticator_aws_secret_manager.py
 create mode 100644 parquet_flask/authenticator/authenticator_factory.py
 create mode 100644 parquet_flask/authenticator/authenticator_filebased.py
 create mode 100644 parquet_flask/authenticator/authenticator_pass_through.py
 create mode 100644 parquet_flask/aws/aws_secret_manager.py
 create mode 100644 parquet_flask/cdms_lambda_func/__init__.py
 create mode 100644 parquet_flask/cdms_lambda_func/ingest_s3_to_cdms/__init__.py
 create mode 100644 
parquet_flask/cdms_lambda_func/ingest_s3_to_cdms/execute_lambda.py
 create mode 100644 
parquet_flask/cdms_lambda_func/ingest_s3_to_cdms/ingest_s3_to_cdms.py
 create mode 100644 parquet_flask/cdms_lambda_func/lambda_func_env.py
 create mode 100644 parquet_flask/io_logic/cdms_schema.py
 create mode 100644 
parquet_flask/io_logic/parquet_query_condition_management_v3.py
 create mode 100644 parquet_flask/io_logic/partitioned_parquet_path.py
 delete mode 100644 parquet_flask/io_logic/query.py
 create mode 100644 parquet_flask/io_logic/query_v4.py
 create mode 100644 parquet_flask/io_logic/raw_query.py
 create mode 100644 parquet_flask/io_logic/spark_constants.py
 create mode 100644 parquet_flask/v1/authenticator_decorator.py
 rename parquet_flask/v1/{apidocs.py => insitu_query_swagger.py} (73%)
 rename parquet_flask/v1/{apidocs => insitu_query_swagger}/index.html (100%)
 rename parquet_flask/v1/{apidocs => 
insitu_query_swagger}/insitu-spec-0.0.1.yml (71%)
 create mode 100644 s3a.parquet.performance.issue.md
 create mode 100644 terraform/cdms-parquet-tf/ddb.tf
 create mode 100644 terraform/cdms-parquet-tf/eks.tf
 create mode 100644 terraform/cdms-parquet-tf/lambda.tf
 create mode 100644 terraform/cdms-parquet-tf/main.tf
 create mode 100644 terraform/cdms-parquet-tf/s3.tf
 create mode 100644 terraform/cdms-parquet-tf/variables.tf
 create mode 100644 terraform/cmd-paruqet.tf
 create mode 100644 terraform/main.tf
 create mode 100644 terraform/variables.tf
 create mode 100644 tests/__init__.py
 create mode 100644 tests/bench_mark/__init__.py
 create mode 100644 tests/bench_mark/bench_mark.py
 create mode 100644 tests/bench_mark/func_exec_time_decorator.py
 create mode 100644 tests/parquet_flask/__init__.py
 create mode 100644 tests/parquet_flask/io_logic/__init__.py
 create mode 100644 
tests/parquet_flask/io_logic/test_parquet_query_condition_management_v3.py
 create mode 100644 
tests/parquet_flask/io_logic/test_partitioned_parquet_path.py
 create mode 100644 tests/parquet_flask/utils/__init__.py
 create mode 100644 tests/parquet_flask/utils/test_general_utils.py

[incubator-sdap-in-situ-data-services] branch master updated (b68a24f -> 4e65e0e)

Reply via email to