Hey everyone,

I'm Mohamed Hossam, a recent CS graduate who's interested in database
systems. I'm currently writing a proposal for the "Backup/restore utility
for AsterixDB [ASTERIXDB-3697
<https://issues.apache.org/jira/browse/ASTERIXDB-3697>]" project for GSoC
2026. I emailed the potential mentor for this project a couple of weeks
ago, and he instructed me to run AsterixDB locally and investigate the code
in asterixdb-metadata. So, I did as suggested by my mentor and I'm excited
to share what I learnt.

I managed to create a very minimal prototype that can query AsterixDB and
generate basic CREATE and INSERT statements from its data. You can find my
work at: [m0hossam/asterixdb-dump
<https://github.com/m0hossam/asterixdb-dump/>]. I used FasterXML's Jackson
JSON parser and tried to recreate the metadata objects from the parsed
JSON. Of course, this is only a proof of concept, I'm deliberately ignoring
complex statements just to get the prototype up and running to test
the feasibility of the project. The actual implementation will require a
deeper understanding of AsterixDB's metadata and catalog.

My next steps would be:

   - Dive deeper into asterixdb-metadata and gain a better understanding of
   the data model.
   - Potentially contribute to AsterixDB if I find something to improve in
   the relevant code areas.
   - Write automated unit tests to compare queries from the original
   database with queries from the database generated by my prototype's dump,
   ensuring database integrity.


I have two questions regarding the project:

   1. Should I send my technical proposal to the potential mentor of the
   project for review before submitting it through the official website?
   2. The project size is supposedly "~350 hour (large)". What does this
   mean in terms of time commitment? Will the project have an extended
   timeline? Or does the project require 8 hours of work per day during the 3
   months of coding?


Best regards,
Mohamed Hossam

Reply via email to