*Role of Python in ETL- Deepak* ETL - Extract, Transform and Load
Python used to customize ETL because of easy to read, write and execute structure Python ETL tools: 1. Petl - Example with converting 4x2 array to html 2. Pandas - has lot of implementations in which etl with transformations, conditional Example - load the data from file , find duplicates and send it to another file 1. Mara - Light weight , web based ui(special feature) 2. Apache Airflow- created by Airbnb, DAG(Directed Acyclic Graphs) 3. Pyspark - Big Data tool, Data streaming , ML on top of streaming 4. Bonobo - Supports Parallel processing 5. Luigi - Created by spotify, for enterprise level solution (more incoming data every minute) 6. AVIK Cloud - Not open source. Python can be implemented directly. Its a software as a service product Doubts : Ashok: Can you explain me pipeline Deepak: For enterprise level activity i have multiple operations in parallel so pipelines are created. *Introduction to Cyclic Redundancy Check (CRC)- Ashok* Also called frame check sequence Types of Errors - 1. Single Bit error 2. Burst error Error detection in Computer Network errors- Add Redundancy bits 1. A basic example - to transmit 1000 bits from CP1 to CP2 + 125 redundancy bit 2. State Machine 3. 2 dimensional Parity Check - for 32 bit extra 13 bits are send 4. Checksum - Binary Addition 5. CRC - most common used in digital systems i) State machine ii) CRC computation -XOR in Polynomial Division iii) Code walk through Conclusion - we may get a CRC error if we open the harddrive. *Open Slot:* Ashok - Documentary suggestion - Prediction by the numbers, The Code both are available in Netflix Rengaraj - Pycon İndia 2020 Pradeep -Mit Opencourseware https://ocw.mit.edu/index.htm Vijay Ravider - python modules used in infrastructure based provisioning services
_______________________________________________ Chennaipy mailing list Chennaipy@python.org https://mail.python.org/mailman/listinfo/chennaipy