Hey everyone, I'm Mohamed Hossam, a recent CS graduate who's interested in database systems. I'm currently writing a proposal for the "Backup/restore utility for AsterixDB [ASTERIXDB-3697 <https://issues.apache.org/jira/browse/ASTERIXDB-3697>]" project for GSoC 2026. I emailed the potential mentor for this project a couple of weeks ago, and he instructed me to run AsterixDB locally and investigate the code in asterixdb-metadata. So, I did as suggested by my mentor and I'm excited to share what I learnt.
I managed to create a very minimal prototype that can query AsterixDB and generate basic CREATE and INSERT statements from its data. You can find my work at: [m0hossam/asterixdb-dump <https://github.com/m0hossam/asterixdb-dump/>]. I used FasterXML's Jackson JSON parser and tried to recreate the metadata objects from the parsed JSON. Of course, this is only a proof of concept, I'm deliberately ignoring complex statements just to get the prototype up and running to test the feasibility of the project. The actual implementation will require a deeper understanding of AsterixDB's metadata and catalog. My next steps would be: - Dive deeper into asterixdb-metadata and gain a better understanding of the data model. - Potentially contribute to AsterixDB if I find something to improve in the relevant code areas. - Write automated unit tests to compare queries from the original database with queries from the database generated by my prototype's dump, ensuring database integrity. I have two questions regarding the project: 1. Should I send my technical proposal to the potential mentor of the project for review before submitting it through the official website? 2. The project size is supposedly "~350 hour (large)". What does this mean in terms of time commitment? Will the project have an extended timeline? Or does the project require 8 hours of work per day during the 3 months of coding? Best regards, Mohamed Hossam
